>This seems to me to be an overly optimistic view of the value of maximum
>likelihood approaches. In general, maximum likelihood is vulnerable to
>errors in the chosen model of evolution and in the parameter estimates
>that are used. Granted, there are ways to test which of a limited set of
>models best explains the data and parameter values can be estimated (with
>error) from the data. However, it is a certainty that the perfectly
>"correct" correct model and parameter values will never be tested. The
>magnitude of the resulting errors in phylogeny estimation remain unknown
>in particular cases. While it is possible that a thorough ML analysis
>might usually yield a tree that is close to the correct one, I worry that
>overly optimistic presentations of the power of ML will lead those less
>familiar than Dr. Goldman with the limitations of ML to develop too much
>confidence in those results.
Take a tree like this (a classic MP Felsenstein Zone case, I believe):
tip1 \
\
\
\ tip2
\ /
|
/ \
/ tip3
/
/
/
tip4
then I set the long branches to be 1.0 subst/site and the short branch
lengths
(including the internal one) to be 0.1 subst/site. I also created another
tree
where the long branches are 5.0 subst/site (fairly well saturated). I
generate
sequences using my program Seq-Gen (HKY model ts/tv 2.0 equal base
freqs). For
each tree I generated 2 data sets with 500 bp and 5000bp, 100 times each.
I then checked the likelihood of each of the tree topologies and ran a
Kashino
Hasegawa test on them. The results were:
Tree 1 (long branches=1.0, short=0.1)
500 bp: 58 correct trees (0 significantly right), 42 wrong (0
significantly wrong)
5000 bp: 98 correct trees (47 significantly right), 2 wrong (0
significantly wrong)
Tree 2 (long branches=5.0, short=0.1)
500 bp: 56 correct trees (0 significantly right), 44 wrong (0
significantly wrong)
5000 bp: 61 correct trees (0 significantly right), 39 wrong (0
significantly wrong)
By significantly right, I mean that the wrong trees are rejected in
favour of the right one
By significantly wrong, I mean that the right tree is rejected in favour
of one the wrong
These are just some quick and dirty results - so don't quote me on them.
Goes to show that saturation causes uncertainty in ML phylogenetic
estimation. Although
you can get wrong answers, you don't get significantly wrong ones - which
is what we
want in a phylogenetic method. Of course these simulations assume a model
of substitution
that we know to be correct (I simulated it that way). My point really is
that if you
assume a model and the assumptions are wrong, you will get invalid
results.
The benefit of ML is that the model is explicit and the assumptions
testable
(see Goldman, 1993 for example - actually both of his 1993 papers are
good).
Andrew
===================================================================
Andrew Rambaut, EMAIL - andrew.rambaut at zoo.ox.ac.uk
Zoology Department, WWW - http://evolve.zoo.ox.ac.uk/
University of Oxford, TEL - +44 1865 271272
South Parks Road, Oxford, UK FAX - +44 1865 271249
===================================================================