In article <9qhj8c$7vc$1 at mercury.hgmp.mrc.ac.uk>,
James McInerney <james.o.mcinerney at may.ie> wrote:
>It is true to say that if the model is exactly correct, then the correct
>tree will have the highest likelihood.
I don't think so. If I simulate data under a completely specified model
and then analyze that data under the same model, I don't always
recover the correct tree with an ML method. This is true even in cases
small enough to permit exhaustive search, so we can't blame failure
of the heuristic search either.
Some trees, those with short internal branches in particular, are
extremely difficult to recover and with finite data you may not recover
the right tree even with a perfectly performing method. There is nothing
you can do if events that should be rare happen to have happened
a little more often than usual in your particular data.
> It is probably naiive to suggest we
>could always devise a model that will accurately reflect the evolution of
>the sequences of interest (base composition variation, superimposed
>substitutions along internal branches, funny things with indels, adaptive
>evolution etc.). However, the most appropriate test is almost certainly
>whether or not we are using a model that is appropriate for the data.
Sometimes it will be more appropriate to use a simplified model, even an
oversimplified model, than a more complex one with so many parameters
that you cannot accurately estimate them all. The model with best
performance may well be simpler than the truth.
Of course, if you have enough data to pin down the model exactly, a
fully correct, complex model looks very attractive. I'm skeptical that
we generally have that much data. Maybe in some viruses we will soon.
Mary Kuhner mkkuhner at genetics.washington.ed