Making alignments

Guy A. Hoelzer hoelzer at
Fri Jan 23 13:07:26 EST 1998

In article <6aafav$2rj at>, Andrew Rambaut
<andrew.rambaut at> wrote:

> Take a tree like this (a classic MP Felsenstein Zone case, I believe):

> tip1 \
>       \
>        \
>         \   tip2
>          \ /
>           |
>          / \
>         /   tip3
>        /
>       /
>      /
> tip4

> then I set the long branches to be 1.0 subst/site and the short branch lengths
> (including the internal one) to be 0.1 subst/site. I also created another tree
> where the long branches are 5.0 subst/site (fairly well saturated). I generate
> sequences using my program Seq-Gen (HKY model ts/tv 2.0 equal base freqs). For
> each tree I generated 2 data sets with 500 bp and 5000bp, 100 times each.

> I then checked the likelihood of each of the tree topologies and ran a Kashino
> Hasegawa test on them. The results were:

> Tree 1 (long branches=1.0, short=0.1)
>      500 bp: 58 correct trees (0 sig. right), 42 wrong (0 sig. wrong)
>      5000 bp: 98 correct trees (47 sig. right), 2 wrong (0 sig. wrong)

> Tree 2 (long branches=5.0, short=0.1)
>      500 bp: 56 correct trees (0 sig. right), 44 wrong (0 sig. wrong)
>      5000 bp: 61 correct trees (0 sig. right), 39 wrong (0 sig. wrong)

> By significantly right, I mean that the wrong trees are rejected in 
> favour of the right one.
> By significantly wrong, I mean that the right tree is rejected in favour 
> of one the wrong.

> These are just some quick and dirty results - so don't quote me on them. 
> Goes to show that saturation causes uncertainty in ML phylogenetic 
> estimation. 

I would love to see you explore a broader array of saturation in the long
branches with this simulation.  I agree that tree 2 looks subjectively like alot
of saturation, but you have not really demonstrated that long-branch attraction
(LBA) cannot lead significant support of the wrong tree in ML (with the right
model).  Would you push the saturation to 10 or 100 changes/site to see if you
get significant attraction at these outrageously high levels of saturation?  I
would certainly be intersted in that result.  If you can get significant LBA
under these conditions, then you can conclude that saturation can cause more
than just uncertainty with ML and the right model.

> Although you can get wrong answers, you don't get significantly wrong ones 
> which is what we want in a phylogenetic method. Of course these simulations 
> assume a model of substitution that we know to be correct (I simulated
it that 
> way). My point really is that if you assume a model and the assumptions are 
> wrong, you will get invalid results.The benefit of ML is that the model is 
> explicit and the assumptions testable(see Goldman, 1993 for example -
> both of his 1993 papers are good).

This is certainly one thing I would like in a phylogenetic method.  I would also
like a method that does significantly support the right tree under most
conditions, and does not require me to provide a long, detailed set of
assumptions (i.e. model, etc.), the validity of which I cannot know.  I know
that you said earlier that ML assumptions can be tested, but you cannot test if
you have the right model, and parameter estimates derived from the data
(assuming the model) always contain error.  I also realize that there are tests
to tell you things like:  does adding this new parameter significantly improve
my ability to explain the data?  However, there is no way to know if you have
captured the set of most important parameters.  It always comes down to a sort
of curve fitting exercise; of the models you tested, which one fits best,
regardless of the biological relevance.

Guy Hoelzer                              e-mail:  hoelzer at
Department of Biology                    phone:   702-784-4860
University of Nevada Reno                fax:     702-784-1302
Reno, NV  89557

More information about the Mol-evol mailing list