advice on bootstrapping values

Joe Felsenstein joe at
Tue Mar 24 11:04:41 EST 1998

In article <6ev5h0$7ml at>,
Brian Fristensky  <frist at cc.UManitoba.CA> wrote:
>Mike Tennant wrote:
>> that about well over 50% of the positions have gaps characters in), but
>> the core regions are all aligned in a sensible manner.

This suggests that you might want to concentrate on the "core regions"
as the other parts might be more likely to be misaligned and contribute a
lot of noise.

>> I've bootstrapped
>> the resulting tree (1000 samples) in clustalw, and seen that the values
>> on the nodes in the sub-trees (those sequences which are easily related)
>> are relatively high (80+ %), but the values between sub-trees can be
>> very low (as low as 2%).
>>     I'd appreciate it of anybody could comment on these values,

It is to be expected that the parts of the tree distant from the tips
will be less well estimated than the rest and have lower P values.

[Brian Fristensky commented]
>The problem is how you make use
>of the bootstrap results. On the surface, one might be tempted
>to dismiss any tree or clade that didn't have
>a bootstrap value or 95% or greater as meaningless.

Before we get to Brian's further comments, one should note that work by
Zahrkikh and Li (MBE 1992) and Hillis and Bull (Syst Biol, 1993) has shown
that the bootstrap P values are biased downwards (actually, towards 50%).  It
seems that values of 70% are often significant.  See also my paper with
Kishino (Syst Biol, 1993).

>1) Because bootstrap resampling of N sites necessarily occurs over
>a number of sites less than N, for any given bootstrapped
>replicate, NO tree based on any replicate will be constructed
>using as much information as a tree that uses all sites.
>In other words, no replicate tree can be as good as the 
>tree made using all the data.

Yes, but the point of the replication is to get a handle on how much
conflict there is among characters, a measure of variability.  Not to
improve the point estimate of the phylogeny, but to make an interval
around it.  So I find (1) to be not relevant.

>2) ...
>... Assuming resampling is done
>using a normal distribution, in any given tree some sites will
>be represented many times, and some sites will not be represented
>at all. Each individual tree is biased towards some subset of
>sites. This should all average out if you do enough bootstrap
>replicates, so that all trees are biased at different places
>each time. 

Average out ... for what purpose?  Again, the purpose is to see what
further sampling from a set of sites with about as much internal
conflict as the present data set would do.  For that, the weighting of
sites more in some samples and less in others is just what is needed

>Having all trees biased is not a bad thing. In fact, it tries
>to simulate what would happen if we could keep going back 
>to our population and getting fresh data. For very large
>datasets (ie. long sequences) we should always get about
>the same answer.

Again, same answer for what?  The point estimate or the interval?

>Small datasets (eg. short sequences, small numbers of RFLP 
>or RAPD markers) are particularly sensitive to sampling.
> ... Some sequences might 
>cluster close together not because they are closely-related,
>but because the data set we happened to get makes them
>look closely-related. In this way bootstrapping tells us
>that when we have a small dataset, it is inherently
>less reliable than a large one.

Yup, that's what it's intended to do.   The sensitivity to
sample size is a feature, not a bug.

>3) On the other hand, one has to wonder whether resampling
>estimates for small datasets really mean the same thing 
>as estimates based on large datasets. In larger datasets,
>when each bootstrap replicate is likely to be unique.
>For small datasets, there's less data to sample, so 
>you keep resampling the same sites over and over. The
>bootstrap estimate carries with it the assumption that
>each replicate is independent of other replicates. That
>probably isn't true for small datasets.

I suspect this is not a valid argument.  When you toss coins
10 times, you have some chance (1/1024) of getting the same 10
outcomes as a previous set of 10.  Is that "nonindependence"?
No, and no statistician worries about it.

>c) Perhaps the best way to use bootstrap estimates is
>as a means of comparing the relative strength to which
>different groupings are supported.

Hillis and Bull said that and it's an overreaction.  The actual
P values may be biased, but are meaningful, not just the
relatuve rankings of them or the ratios between them.

Joe Felsenstein         joe at
 Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA

More information about the Mol-evol mailing list