In article <8f6qu1$d8t$1 at mercury.hgmp.mrc.ac.uk>, Nick Goldman
<N.Goldman at zoo.cam.ac.uk> wrote:
> Yes, often the distribution of "L_ml - L_null" is quite tight (small
> range). This will indeed often be because the ML tree for a simulated
> data set will be very similar to the null hypothesis tree on which the
> data were simulated. This simply reflects the situation under the
> assumption that the null hypothesis is true---and so it is appropriate
> to reject the null hypothesis in favour of the alternative hypothesis
> (which is that some other tree is correct). You have to think very
> carefully about what your hypotheses are before you test them!
This leads to a concern that I have about the use of LRTs and parametric
bootstrapping. The limits of the null distribution are constrained
(sometimes greatly) by the assumed evolutionary model, making it easier to
reject the null by LRT. Therefore, the type-I error rate will be larger
using this approach than if simpler models were assumed. The question is:
does parametric bootstrapping lead to an unacceptibly high type-I error
rate? Of course, this will only be a problem when the assumed
evolutionary model is not accurate, which is always the case to some
degree. For example, if you estimate a TI/TV ratio from your data, and
assume the veracity of your estimate in your evolutionary model, it is
probably the case that the TRUE TI/TV ration was somewhat different than
the estimate for every branch in the TRUE tree. Therefore, the null
distribution you create through repeated simulation is then guaranteed to
differ from the universe of potential likelihoods that could have been
explored during the evolution of your taxa. The realized variation in
TI/TV ratios would surely broaden the TRUE null distribution, compared to
the one estimated through simulations, leading to inflated type-I error
rates in the analysis. I am curious if there is any evidence relating to
this potential problem.
--
Guy Hoelzer
Department of Biology
University of Nevada Reno
Reno, NV 89557