IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Making alignments

Andrew Rambaut andrew.rambaut at zoology.oxford.ac.uk
Fri Jan 23 11:09:03 EST 1998

>This seems to me to be an overly optimistic view of the value of maximum
>likelihood approaches.  In general, maximum likelihood is vulnerable to
>errors in the chosen model of evolution and in the parameter estimates
>that are used.  Granted, there are ways to test which of a limited set of
>models best explains the data and parameter values can be estimated (with
>error) from the data.  However, it is a certainty that the perfectly
>"correct" correct model and parameter values will never be tested.  The
>magnitude of the resulting errors in phylogeny estimation remain unknown
>in particular cases.  While it is possible that a thorough ML analysis
>might usually yield a tree that is close to the correct one, I worry that
>overly optimistic presentations of the power of ML will lead those less
>familiar than Dr. Goldman with the limitations of ML to develop too much
>confidence in those results.

Take a tree like this (a classic MP Felsenstein Zone case, I believe):

tip1 \
        \   tip2
         \ /
         / \
        /   tip3

then I set the long branches to be 1.0 subst/site and the short branch 
(including the internal one) to be 0.1 subst/site. I also created another 
where the long branches are 5.0 subst/site (fairly well saturated). I 
sequences using my program Seq-Gen (HKY model ts/tv 2.0 equal base 
freqs). For
each tree I generated 2 data sets with 500 bp and 5000bp, 100 times each.

I then checked the likelihood of each of the tree topologies and ran a 
Hasegawa test on them. The results were:

Tree 1 (long branches=1.0, short=0.1)
     500 bp: 58 correct trees (0 significantly right), 42 wrong (0 
significantly wrong)
     5000 bp: 98 correct trees (47 significantly right), 2 wrong (0 
significantly wrong)

Tree 2 (long branches=5.0, short=0.1)
     500 bp: 56 correct trees (0 significantly right), 44 wrong (0 
significantly wrong)
     5000 bp: 61 correct trees (0 significantly right), 39 wrong (0 
significantly wrong)

By significantly right, I mean that the wrong trees are rejected in 
favour of the right one
By significantly wrong, I mean that the right tree is rejected in favour 
of one the wrong

These are just some quick and dirty results - so don't quote me on them. 
Goes to show that saturation causes uncertainty in ML phylogenetic 
estimation. Although
you can get wrong answers, you don't get significantly wrong ones - which 
is what we
want in a phylogenetic method. Of course these simulations assume a model 
of substitution
that we know to be correct (I simulated it that way). My point really is 
that if you 
assume a model and the assumptions are wrong, you will get invalid 
The benefit of ML is that the model is explicit and the assumptions 
(see Goldman, 1993 for example - actually both of his 1993 papers are 


  Andrew Rambaut,             EMAIL - andrew.rambaut at zoo.ox.ac.uk
  Zoology Department,           WWW - http://evolve.zoo.ox.ac.uk/
  University of Oxford,         TEL - +44 1865 271272
  South Parks Road, Oxford, UK  FAX - +44 1865 271249

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net