For Keith's N-termini, and C-termini comparisons discussed recently,
Steve Smith recently advised:
> One common way is to calculate a distance matrix from the
> overlapping regions, and use Neighbor Joining, or max-likelyhood
> to resolve the phylogeny. Phylip 3.5c has a nice function
> for geting distances out of AA sequence.
> Parsimony is a problem when you dont have positional overlap accross
> all members, but the distance methods should give you a good approximation.
> Just don't try to over-interpret your results. UPGMA clustering makes
> the least speculative statement about your data, and may be most
> Steve Smith
Without wanting to start any small or large battles, it seems
to me that distance methods, especially UPGMA, should be much more
prone to the general problem of partial representation in the
sequences. In parsimony, the stretch of sequence lacking for a
particular sequence does not influence the resulting topology. That
doesn't mean that those regions are not informative for the relative
branching relationships of sequences that are represented, assuming
that there are at least 4 left. If one has mostly missing data in a
matrix, there well may be problems with the heuristic algorithms that
are employed for parsimony searches, but it was my impression that
methods that make pairwise similarity estimates between sequence are
more prone to problems with missing data. Joe may very well correct
me if that impression is false.