% Seq Identity - How?
John H McDonald
mcdonald at strauss.udel.edu
Tue Feb 21 17:40:20 EST 1995
In article <3idhtc$nb9 at server.st.usm.edu>,
Shiao Y. Wang <sywang at whale.st.usm.edu> wrote:
>
>How does one calculate % sequence identity for multiple clones? We have 4
>cDNA clones sequenced and we need to report some measure of sequence
>identity. Is it the number of nucleotides shared divided by total number
>of nucleotides?
>For example:
>
>Clone 1 GGG CCC TTT A
>Clone 2 GGG TCC TTT A
>Clone 3 TGG CCC TTT A
>
>Are these three clones (28/30) 93% identical or (8/10) 80% identical or ??
You could use nucleotide diversity, which is "the average number of
nucleotide differences per site between two sequences" (M. Nei, Molecular
Evolutionary Genetics, p. 256). For each pair of sequences, you count
the number of differences, sum these, then divide the total by the total
number of bases compared. In your example, there is one difference
between clones 1 and 2, one between 1 and 3, and two between 2 and 3, so
the diversity would be (1+1+2)/(10+10+10)=0.13.
John H. McDonald
Department of Biology
University of Delaware
More information about the Methods
mailing list