Sequence alignments -- where to chop?
Andrew J. Roger
aroger at ac.dal.ca
Fri Apr 19 12:37:51 EST 1996
Hi,
I have a related but slightly different problem. The program
PROTML (Adachi and Hasegawa's protein max. like. program) codes
missing data as a 21st amino acid. Thus, if you have incomplete
sequences in the same region from two taxa in your alignment,
the overlapping N's or ?'s are counted as the same sequence.
Clearly this will positively mislead any program into
showing that the two sequences with missing data are closely
related. However, in the case where only one sequence
has many N's, I do not see how the program will be positively
mislead. I can see that the total likelihood of the data
will go down (relative to having the real sequence in the
region) because many relatively improbable changes will be incorporated
into the likelihood calculation. Can anyone see if the presence
of multiple N's in this situation will cause positively
misleading topologies to result? My belief is that the
only real effect will be to lengthen the branch leading
to the taxon with the missing data.
Cheers
Andrew J. Roger
More information about the Mol-evol
mailing list