Parametric bootstrapping
Nick Goldman
N.Goldman at zoo.cam.ac.uk
Mon May 8 06:25:07 EST 2000
Vijay Aswani wrote:
>
> A number of those who responded pointed me to the web site of Nick Goldman
> who had a manuscript and some very helpful procedural tips on the process.
I should point out that the ms. in question (see
http://www.zoo.cam.ac.uk/zoostaff/goldman/tests) is about problems that
I and others contend exist with common uses of the Kishino-Hasegawa test
of phylogenies. It is a lot more than simply a recipe for parametric
bootstrapping, although it does include some parametric bootstrap
examples.
The ms. is "accepted pending minor revision"; these minor revisions
are under way, and the revised version of the ms. should be complete
this week and will be posted on the WWW site soon afterwards. Anyone is
invited to e-mail me if they would like to be kept informed of progress.
> This brings me to my question: isn't using the hypothesis tree's topology
> and ML parameters to build the 100 datasets and then computing the best tree
> in each dataset a bit circular. Wouldn't the best tree in each case be the
> same tree whose topology and ML parameters were used to create the data sets
> in the first place? Perhaps the reason why L null and L ML differ so little
> is that the dataset was created from the parameters of the null tree.
>
> If this is true, then the range of L ml - L null would be very small (since
> they would be almost the same) and almost every hypothesis tested would be
> rejected.
It is not circular. It is the strategy as used in most traditional
statistics, i.e. "if the null hypothesis were true, what would be the
distribution of my test statistic?". When the null hypothesis contains
unknown parameters, you may have to estimate values for them in order to
work out the distribution in question. So long as your method for
working out the distribution in question allows for the fact that such
parameters have been estimated (which is done by analyzing the simulated
data in the same way that you analyzed the original data), the procedure
is justified.
Yes, often the distribution of "L_ml - L_null" is quite tight (small
range). This will indeed often be because the ML tree for a simulated
data set will be very similar to the null hypothesis tree on which the
data were simulated. This simply reflects the situation under the
assumption that the null hypothesis is true---and so it is appropriate
to reject the null hypothesis in favour of the alternative hypothesis
(which is that some other tree is correct). You have to think very
carefully about what your hypotheses are before you test them!
Nick Goldman
-----------------------------------------------------------------------
Nick Goldman, Dept of Zoology, tel: +44-(0)1223-336649
Downing St, Cambridge CB2 3EJ, U.K. fax: +44-(0)1223-336679
N.Goldman at zoo.cam.ac.uk http://www.zoo.cam.ac.uk/zoostaff/goldman
---
More information about the Mol-evol
mailing list