% Seq Identity - How?

John H McDonald mcdonald at strauss.udel.edu
Tue Feb 21 17:40:20 EST 1995

In article <3idhtc$nb9 at server.st.usm.edu>,
Shiao Y. Wang <sywang at whale.st.usm.edu> wrote:
>How does one calculate % sequence identity for multiple clones? We have 4
>cDNA clones sequenced and we need to report some measure of sequence
>identity. Is it the number of nucleotides shared divided by total number
>of nucleotides?
>For example:
>Clone 1     GGG CCC TTT A
>Clone 2     GGG TCC TTT A
>Clone 3     TGG CCC TTT A
>Are these three clones (28/30) 93% identical or (8/10) 80% identical or ??

You could use nucleotide diversity, which is "the average number of 
nucleotide differences per site between two sequences" (M. Nei, Molecular 
Evolutionary Genetics, p. 256).  For each pair of sequences, you count 
the number of differences, sum these, then divide the total by the total 
number of bases compared.  In your example, there is one difference 
between clones 1 and 2, one between 1 and 3, and two between 2 and 3, so 
the diversity would be (1+1+2)/(10+10+10)=0.13.  

John H. McDonald
Department of Biology
University of Delaware

More information about the Methods mailing list