Parametric bootstrapping

Andrew Rambaut andrew.rambaut at zoology.oxford.ac.uk
Tue May 2 05:51:00 EST 2000


[[ This message was both posted and mailed: see
   the "To," "Cc," and "Newsgroups" headers for details. ]]

In article <8ek5ui$av5$1 at mercury.hgmp.mrc.ac.uk>, Vijay Aswani, Ph.D.
<vaswani at sinfo.net> wrote:

> significance I thought I would do parametric bootstrapping. Can anyone
> please show me how I go about this, step by step?

Nick Goldman has a web page that describes the procedure:

http://www.zoo.cam.ac.uk/zoostaff/goldman/index.html

Click on the link that says "The Kishino-Hasegawa test of phylogenies
is seriously biased: more information here...".

> 1. Generate simulated data sets using Seq-Gen (how many are
> appropriate?) Do

Perhaps 200? Depends on how border-line the test statistic is. The
nice thing about parametric bootstrapping is that if you have more 
than one machine available you can divide the task between them.

> I generate two groups of datasets - one with the best tree and the other
> with the best tree with the sub-group constrained as monophyletic?)

No. You only generate the data on the NULL hypothesis. That is if your
ML tree does not exhibit monophyly, then your NULL hypothesis is the
best tree which does. 

The question you are asking is, "If the truth is that the group is
monophyletic, is it likely that I got the non-monophyly result due
to random error?"

So you simulate on the NULL hypothesis and for each you perform 
exactly the same analysis that you did for your real data.

> 2. I guess I would then have to compute the likelihood scores for each
> of
> the simulated datasets. Assuming I did a 100, is there any automated
> way of
> doing this? Or do I have to open each dataset in PAUP*, load the
> appropriate
> tree, calculate likelihood scores and append them to a file?

Design a PAUP block with your PAUP commands for finding the ML
trees under the monophyly constraints and without. Use the program
Phy2Nex which is included in the Seq-Gen package to create a NEXUS
file with your PAUP block inserted after each replicate dataset.
Run this through PAUP.

The best way of writing the likelihoods to a file is to (after
the tree search) use the command:

LSCORE 1 \ FILE=NULL.likelihoods APPEND=YES;

This writes the likelihood of the tree to a file called
NULL.likelihoods, appending each to the end of the file.

> 3. What do I do next? Do I make a matrix doing substractions of every
> combination of likelihood values from the best tree set with those from

No you take the difference between the log likelihood for the
monophyly constrained tree and the unconstrained tree for each
simulated dataset. You then compare the difference in log likelihood
you got for your real data.

> constrained tree set? Is there any program to do this? Is this what

Excel works well. Sort the simulated deltas and see what percentile
your real delta falls at.

> 4. Am I on the right track?!

Nearly.

Andrew
---







More information about the Mol-evol mailing list