Bootstrap and Happiness

Joe Felsenstein joe at evolution.genetics.washington.edu
Fri Dec 19 20:48:26 EST 1997

In article <67et62$oqt at net.bio.net>,  <newsmgr at merrimack.edu> wrote:
>From: foxik at aol.com (Foxik)
>On a website <<http://cmgm.Stanford.EDU/phylip/seqboot.html>> I found the
>following statement:
><<The R option allows the user to set the number of replicate data sets. This
>defaults to 100. Most statisticians would be happiest with 1000 to 10,000
>replicates in a bootstrap, but 100 gives a good rough picture. You will have to
>decide this based on how long a running time you want.>>
>My question is how happiness of a statistician is related to statistical
>significance. Using GCG-PAUP I found that with neighbor-joint option I will get
>very different results using 100, 1,000 or 10,000 replicates in one particular
>case. The same is true for the parsimony option. However, results obtained with
>10,000 replicates with N-J option are similar to results obtained with 1,000
>replicates with P option.It seems like to achieve the same state of happiness
>it requires only 1,000 P versus 10,000 N J.  I would greatly appreciate any
>comments on this issue.

Well, I'm the person who wrote the words you quoted -- they are in the
documentation file SeqBoot.doc in PHYLIP.

I very much doubt that there is any general rule of the sort "Foxik" suggests,
that 1,000 replicates of one method give similar answers to 10,000 replicates
of another.  This is because the replicates are independent, and the
majority-rule consensus tree contains those groups that show up frequently.
The expected fraction of times a group shows up, among the replicates, is
the same whether there are 1,000 or 10,000 replicates.  Of course in
practice it will vary randomly, and that may be responsible for the pattern
seen in this comparison.

One should add that there have been demonstrations by Zharkikh and Li and
by Hillis and Bull, that the bootstrap P values are biased -- generally
being too low.  Our comments on that will be found in Systematic Biology
in 1993:

  J. Felsenstein and H. Kishino.  1993.  Is there something wrong with
  the bootstrap on phylogenies? A reply to Hillis and Bull.  Systematic
  Biology  42: 193-200.

As for the number of replicates, 100 is a bit small but 1,000 should be
more than enough.  Statisticians are only happy with 10,000 or more but
that is because they imagine themselves to be achieving great precision, and
they worry about much smaller fluctuations than biologists do.

Joe Felsenstein         joe at genetics.washington.edu
 Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net