James McInerney <james.o.mcinerney at may.ie> writes:
> Hi all,
>>> There is one fundamental difficulty in all of this - although modeltest will
> tell you whether or not one model is better than another (provided we are
> examining models that are related in some way - nested models), the real
> question should be whether or not you are using the _correct_ model.
>> It is true to say that if the model is exactly correct, then the correct
> tree will have the highest likelihood. It is probably naiive to suggest we
> could always devise a model that will accurately reflect the evolution of
> the sequences of interest (base composition variation, superimposed
> substitutions along internal branches, funny things with indels, adaptive
> evolution etc.). However, the most appropriate test is almost certainly
> whether or not we are using a model that is appropriate for the data.
An interesting approach to dealing with this problem is the one proposed
by Diaconis P., Graham, R., and Holmes S., ``Matchings and Phylogenies,''
Proc. Nat. Acad. Sciences, 14600-14602 (1999). The set of ``ultrametric''
inequalities satisfied by the phylogenteic distances on the set of all
possible phylogenetic trees with N leaves defines the vertices of a
convex polytope. Diaconis, Graham, and Holmes therefore suggest that
the likelihood function be extended such that it is well-defined
everywhere within the polytope, instead of just at the vertices
(this can always be done), and that optimization be done on the
polytope. If the Maximum Likelihood estimate (or for Bayesians,
the Maximum a-posteriori Probability estimate) tends toward a vertex,
the data strongly supports the tree corresponding to that vertex.
If the optimum falls on an edge or a face of the polytope, the data
supports a ``mixture'' of the trees corresponding to the vertices
bounding that edge or face, and the mixture coefficients can be
interpreted as the relative probabilities of each tree being ``correct.''
(Note that horizontal transfer would also tend to produce this result).
And if the optimum falls in the interior, then the data supports mixture
of all possible trees, and it's probably time to take a closer look
at it... :-/
A side-benefit of this ``embedding'' or ``continuation'' approach to
phylogenetic optimization is that continuous-parameter optimizers are
usually far faster and more efficient than combinatoric optimizers... :-)
-- Gordon D. Pusch
perl -e '$_ = "gdpusch\@NO.xnet.SPAM.com\n"; s/NO\.//; s/SPAM\.//; print;'
---