use of a Likelihood Ratio Test, new questions

James McInerney james.o.mcinerney at may.ie
Tue Oct 16 04:58:55 EST 2001

Hi all,

There is one fundamental difficulty in all of this - although modeltest will
tell you whether or not one model is better than another (provided we are
examining models that are related in some way - nested models), the real
question should be whether or not you are using the _correct_ model.

It is true to say that if the model is exactly correct, then the correct
tree will have the highest likelihood.  It is probably naiive to suggest we
could always devise a model that will accurately reflect the evolution of
the sequences of interest (base composition variation, superimposed
substitutions along internal branches, funny things with indels, adaptive
evolution etc.).  However, the most appropriate test is almost certainly
whether or not we are using a model that is appropriate for the data.

This is an absolute test, not a relative test (as implemented in modeltest).

The question should be: "Does the model fit my data?"
not : "Is model X better than model Y?"

Nick Goldman proposed an absolute test in 1993.  It is still not widely
used, but this is probably the question we should be asking.  e.g. K2P might
be better than JC, but it still might not be an appropriate model
(GTR+gamma+I is still homogenous over the tree).

If we use an inappropriate model then we cannot be sure of the results.  If
we use the best model available in PAUP/MOLPHY/PUZZLE/PAML/NHML then it
might still not be adequate/appropriate/realistic.

Although it might...

but we should test that....


p.s. short synopsis of Goldman method - simulate fake data under model to be
tested (x1,000 times or more).  Analyse these data, record likelihood. see
if the original observed data could have been generated by such a model
(does the likelihood of observing the original data fall within the 95%
cutoff of the fake data).  If so, then we assume the model is appropriate
for analysing the data.

