Boostrap vs. Jackknife
rodrigo at u.washington.edu
Mon Aug 17 13:24:43 EST 1998
Tim Bushnell wrote in message <6r9n7s$han at net.bio.net>...
>I have been doing a lot of reading recently about phylogenetic analysis,
>in partcular using 16S rRNA molecules for phylogenetic studies in
>bacteria. One thing that has struck me is that in all cases I have read
>only bootstrap analysis is performed to examine the robustness of a given
>I have played with some sequence data doing both bootstrap and jackknife
>(using PHYLIP SEQBOOT) analysis. I have noticed that there are not real
>differences in the support for a given node in either case +/- 1 to 3%.
>Is there any reason for the extensive use of the bootstrap and almost
>competely ignoring the jackknife?
>Thanks in advance for any ideas or comments.
I guess the main concern with using the jacknife is what criterion one
should use to decide how many sites to exclude. Joe Felsenstein, in his
1985 paper on the phylogenetic bootstrap and also in the documentation to
PHYLIP, suggests deleting 50% of the sites (the delete-half jacknife). With
well-behaved univariate measures, Wu (1985) showed that the delete-half
jacknife gives values that are essentially identical to the bootstrap (this
paper was pointed out to me by Joe; actually its not quite delete-half,
because there's a correction in there for the number of parameters
estimated. With a large number of sites, deleting half works well enough).
However, in a recent paper in Cladistics, Steve Farris et al. (1995?) argue
that one should delete exp(-1) proportion of sites (i.e., a delete-37%
jacknife). They stress that their argument is not based on statistical
considerations -- you should consult their paper for more details.
I did some very simple simulations to compare the delete-50% to the
delete-37%. I found that when there were very few substitutions in the
sequence set, the delete-37% was closer to the bootstrap values than the
delete-50% which tended to underestimate the bootstrap score. However, when
the rate of substitution was high, the delete-50% was closer to the
bootstrap score, and the delete-37% tended to overestimate the bootstrap
score. This was true for all bootstrap scores I examined (they ranged from
~40% to 100%), but at the tail ends (i.e., at ~40% and 100%) both the
delete-37% and delete-50% tended to converge. One would expect this since
these probably represent absorbing boundaries.
Of course these results aren't very helpful for deciding when we should the
delete-50% or the delete-37%. However, one may conjecture that there is a
general rule-of-thumb as follows: delete-50% scores are less than or equal
to bootstrap scores which are less than or equal to delete-37% scores.
Since it is often claimed that bootstrap values are conservative with
respect to group-support, it is possible that the use of the delete-37% may
alleviate this problem.
-- Allen Rodrigo
More information about the Mol-evol