IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Making alignments

newsmgr at merrimack.edu newsmgr at merrimack.edu
Sat Jan 24 15:53:50 EST 1998

Relay-Version: ANU News - V6.2.0 06/23/97 OpenVMS AXP V6.2; site chasm
Path: chasm!cam-news-feed2.bbnplanet.com!news.bbnplanet.com!dilbert.whoi.edu!not-for-mail
Newsgroups: bionet.molbio.evolution
Subject: Re: Making alignments
Message-ID: <34C9F334.DFF at evol5.mbl.edu>
From: "Andrew J. Roger" <roger at evol5.mbl.edu>
Date: Sat, 24 Jan 1998 09:57:06 -0400
Reply-To: roger at evol5.mbl.edu
References: <6a5f0j$6fg at net.bio.net>
Organization: Woods Hole Oceanographic Institution
Lines: 76
NNTP-Posting-Host: PPP5.MBL.EDU
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.01-C-MACOS8 (Macintosh; I; PPC)

Guy A. Hoelzer wrote:
> In article <6a575g$gpa at net.bio.net>, Joe Staton <jstaton at oeb.harvard.edu> wrote:
> > The model of evolution is another consideration, altogether (i.e.,
> > parsimony) which MAY be more robust to giving information about
> > evolutionary patterns even if not completely correct. (no flames please)
> I think the question of robustness to model assumptions is still a wide
> open question.  At this point, everone I talk to seems to think that their
> approach (MP vs. ML) is more robust, but I am not aware of any direct
> comparisons.  

Actually there are tons of comparisons in the molecular evolution
literature.  Just look for papers by Hillis, Huelsenbeck, Kuhner and
Felsenstein, Yang, Goldman.  Many of their recent papers deal with how
well the different methods perform when simulations are performed under
different sets of models (that is performance under conditions where the
model of simulation is the same or different from the model used for ML
or distance correction).

> However, we are now able to test the basic assumptions of
> the MP model without resorting to induction as is currently done in the ML
> approach.  This is a "shameless plug", but I think it is on point.  I have
> been involved in the development of a method called Relative Apparent
> Synapomorphy Analysis (RASA), which deductively measures matrix
> hierarchy.  If hierarchy is present, and was caused by phylogenetic
> history, then MP will tend to give you the right tree.  IMHO, phylogenetic
> history is the only process that can create matrix hierarchy.  Processes
> like convergent selection or homoplastic evolution destroy that
> hierarchy.  In fact, if there is lots of good, clean hierarchy in the
> matrix, nearly every method with minimal evolutionary model assumptions
> (e.g., NJ, UPGMA) tend to get the tree correct.  IMHO, the value of ML in
> phylogeny estimation will eventually be greatly improved by reducing
> reliance on inductive procedures.

I think all methods, including ML, will give the correct answer when
"clean hierarchy" is present.  The question should be therefore which
methods are likely to outperform other methods when homoplasy starts to
decay the hierarchy.  The problem of MP and simpler models of evolution
(used in distance and ML analysis-- e.g. Jukes & Cantor) is that
particular biases in the way homoplasy occurs will be "overlooked".  For
instance if there is heterogeneity in rates at sites and the model used
in the method ignores this process, then there will be systematic
underestimation of branchlengths during the ML calculations.  This in
turn can lead to the familiar long-branch attraction phenomenon (if
generalized rates in different lineages are different). Unweighted MP
implicitly assumes that all sites are evolving in more or less the same
way-- thus similar problems can arise.  Simple distance corrections will
underestimate pairwise distances under conditions of rate
heterogeneity......However, if rate heterogeneity is built into the
model of evolution, then parameter estimates will not necessarily be
biased and the chances of getting the correct tree increase.  
	In this context, "inductive" procedures (by which I assume you mean
estimating parameters for rate heterogeneity and estimating
branchlengths etc.) potentially allows biases in the way homoplasy
occurs to be accounted for.  Signal can then be detected over this
"noise" which obscures the hierarchy you describe.
	I do agree with you that ML should not be portrayed as a "cure all". 
The problem with using ML is that one must estimate many many parameters
from, what is usually, a small amount of data. The more complex the
model used, the more parameters must be estimated.  This surely
increases the "random error" in the phylogeny estimation, which, I
expect, would decrease the efficiency of the method. I'm not sure what
simulations have shown in this regard -- but I'm guessing that in cases
where the overall divergence between sequences is high and there are few
alignment positions, methods such as weighted parsimony or some simple
distance method will outperform ML methods.  

Does anyone know if this is true?

Andrew J. Roger
Marine Biological Laboratory
Woods Hole, MA
USA, 02543

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net