Biosequences .. Software .. Molbio soft .. Network News .. FTP

# Parametric bootstrapping

Nick Goldman N.Goldman at zoo.cam.ac.uk
Tue May 9 04:29:50 EST 2000

```Guy Hoelzer wrote:
>
> This leads to a concern that I have about the use of LRTs and parametric
> bootstrapping.  The limits of the null distribution are constrained
> (sometimes greatly) by the assumed evolutionary model, making it easier to
> reject the null by LRT.

The statistics (LRTs and bootstraps) do what you ask them to.  This
comment is leading to a discussion of the usefulness of models, which is
a different question to where this thread started.

>  Therefore, the type-I error rate will be larger
> using this approach than if simpler models were assumed.  The question is:
> does parametric bootstrapping lead to an unacceptibly high type-I error
> rate?  Of course, this will only be a problem when the assumed
> evolutionary model is not accurate, which is always the case to some
> degree.  For example, if you estimate a TI/TV ratio from your data, and
> assume the veracity of your estimate in your evolutionary model, it is
> probably the case that the TRUE TI/TV ration was somewhat different than
> the estimate for every branch in the TRUE tree.  Therefore, the null
> distribution you create through repeated simulation is then guaranteed to
> differ from the universe of potential likelihoods that could have been
> explored during the evolution of your taxa.  The realized variation in
> TI/TV ratios would surely broaden the TRUE null distribution, compared to
> the one estimated through simulations, leading to inflated type-I error
> rates in the analysis.  I am curious if there is any evidence relating to
> this potential problem.

I don't necessarily agree that the null distribution estimated by
simulation is *guaranteed* to differ from the universe of
distributions:  for example, if the distribution is independent of the
TI/TV ratio used when estimating it.  Regular (asymptotic) statistics
rely on the (asymptotic) independence of the null distribution and the
unknown true values of parameters 'within' it.  I do agree that the
effect you describe *could* exist, particularly for small (whatever that
means!) data sets.  And no, I don't know of any studies that have
investigated this potential effect in phylogenetic applications.

Nick Goldman

-----------------------------------------------------------------------
Nick Goldman, Dept of Zoology,            tel: +44-(0)1223-336649
Downing St, Cambridge CB2 3EJ, U.K.         fax: +44-(0)1223-336679
N.Goldman at zoo.cam.ac.uk   http://www.zoo.cam.ac.uk/zoostaff/goldman
---

```