use of a Likelihood Ratio Test, new questions

Gordon D.Pusch gdpusch at NO.xnet.SPAM.com
Tue Oct 16 12:03:12 EST 2001

James McInerney <james.o.mcinerney at may.ie> writes:

> Hi all,
> There is one fundamental difficulty in all of this - although modeltest will
> tell you whether or not one model is better than another (provided we are
> examining models that are related in some way - nested models), the real
> question should be whether or not you are using the _correct_ model.
> It is true to say that if the model is exactly correct, then the correct
> tree will have the highest likelihood.  It is probably naiive to suggest we
> could always devise a model that will accurately reflect the evolution of
> the sequences of interest (base composition variation, superimposed
> substitutions along internal branches, funny things with indels, adaptive
> evolution etc.).  However, the most appropriate test is almost certainly
> whether or not we are using a model that is appropriate for the data.

An interesting approach to dealing with this problem is the one proposed 
by Diaconis P., Graham, R., and Holmes S., ``Matchings and Phylogenies,''
Proc. Nat. Acad. Sciences, 14600-14602 (1999). The set of ``ultrametric''
inequalities satisfied by the phylogenteic distances on the set of all
possible phylogenetic trees with N leaves defines the vertices of a 
convex polytope. Diaconis, Graham, and Holmes therefore suggest that 
the likelihood function be extended such that it is well-defined 
everywhere within the polytope, instead of just at the vertices 
(this can always be done), and that optimization be done on the 
polytope.  If the Maximum Likelihood estimate (or for Bayesians, 
the Maximum a-posteriori Probability estimate) tends toward a vertex, 
the data strongly supports the tree corresponding to that vertex. 
If the optimum falls on an edge or a face of the polytope, the data
supports a ``mixture'' of the trees corresponding to the vertices 
bounding that edge or face, and the mixture coefficients can be 
interpreted as the relative probabilities of each tree being ``correct.''
(Note that horizontal transfer would also tend to produce this result).  
And if the optimum falls in the interior, then the data supports mixture 
of all possible trees, and it's probably time to take a closer look 
at it... :-/

A side-benefit of this ``embedding'' or ``continuation'' approach to
phylogenetic optimization is that continuous-parameter optimizers are
usually far faster and more efficient than combinatoric optimizers... :-)

-- Gordon D. Pusch   

perl -e '$_ = "gdpusch\@NO.xnet.SPAM.com\n"; s/NO\.//; s/SPAM\.//; print;'


More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net