Making alignments

Tandy Warnow tandy at central.cis.upenn.edu
Fri Jan 30 11:23:22 EST 1998


In your posting, you said (and I paraphrase)
that maximum parsimony requires that all
sites evolve under identical processes.
Actually, interestingly, Chris Tuffley
and Mike Steel have recently proved
something quite interesting, which
is relevent to this discussion:

   if the sites evolve under independent
   processes, but not necessarily under
   any common mechanism, then maximum
   parsimony and maximum likelihood are
   exactly equivalent, which is to say
   that the rank ordering on the set of
   leaf-labelled tree topologies induced
   by MP is the same as the rank ordering
   induced by ML. 

What is meant by "no common mechanism" is that 
for every pair (edge,site) there is a
probability of substitution, and that given that
a substitution occurs, the probability of
change between every pair of nucleotides is
the same. This model is  more likely to be
biologically realistic than the usual Jukes-Cantor
model, I suspect, which requires that the
sites evolve under identical processes.
Under this general model, it may not be
possible in all cases to obtain the true
tree, even given infinite data, no matter
what method is used (as was shown in earlier
papers by other authors), so that even 
ML can be "inconsistent". 

Thus, for a  biological model of site
substitution, MP is the same as ML, and
consequently consistent on the set of
trees under which ML is consistent, and
vice versa.  The results (originally by
Felsenstein and observed by others) that
MP is not consistent under the Jukes-Cantor
model of evolution do not contradict these
results -- if it is possible to constrain the
space of model trees to only those that have
iid site evolution, then ML can select the
correct tree (as can distance methods 
using corrected  distances), but that
without such constraints, even ML can be
inconsistent, even if the general 
properties of the model are known.

This result in some basic way does give
maximum parsimony a "statistical basis",
and the question may really come  down to 
figuring out what the properties of
real biological data are likely to be,
so that for particular data sets the
appropriate methods can be selected.

At the same time, to the extent that
the objective is more than just getting
the tree topology, ML will always be 
useful in ways that MP cannot be quite
as useful, but for those people who seek
the tree topology primarily, MP may be
the "right" way to go, unless additional
properties about the evolutionary 
process can be inferred which can narrow
the search space (and hence make
MP not equivalent to ML).

I suggest that you see the Tuffley and
Steel paper:

     Chris Tuffley and Michael Steel,
     "Links between maximum likelihood and
      maximum parsimony under a simple model
      of site substitution"
     Bulletin of Mathematical Biology,
     59(3), 581-607, 1997.

Tandy Warnow
University of Pennsylvania
Department of Computer and Information Science
tandy at central.cis.upenn.edu




More information about the Mol-evol mailing list