informative sites & bootstrap
James O. McInerney
j.mcinerney at nhm.ac.uk
Tue May 5 12:31:00 EST 1998
Below we have two answers that appear to be opposed to each other. I
submitted answer number 2. I don't know who submitted answer number 1.
There are two very separate issues here. Firstly, do you leave the
parsimony-uninformative sites in the alignment when you are sampling and
secondly, do you leave the parsimony-uninformative sites in the analysis when
you are searching for a tree?
In my opinion the answer to the first [sampling] question is yes. You should
leave the parsimony-uninformative sites in the alignment when you are
generating the samples. The reasons are given in my answer.
To the second question, I think the answer is no. For precisely the reasons
that were given by the submitter of answer 1.
Are there any other contributions? If I am incorrect in my assumptions, I
would like to know (and also the reasons why I am incorrect). I honestly
don't want to start a cladistic flaming process.
Peter Schuchert wrote:
> to my original question:
> > for a bootstrap analysis, do you use all sites of a sequence
> > alignement or only the informative ones?
> I got the controversial answers given below. Are there more opinions?
> > Actually, that's a good question. I presume you're referring to
> > parsimony analyses, in which only the informative sites should be
> > included for both boostrap analysis and for finding the best (maximum
> > parsimony) tree. If you include sites with autapomorphies, you'll
> > basically be adding a step per autapomorphous site, so your
> > consistency index will be underestimated. If using model-based
> > approaches for your molecular data (e.g., maximum likelihood or
> > distance) use all the sites that are relevant (i.e., not including
> > those where you have alignment ambiguity and so forth).
> > I believe that you should use all sites during bootstrapping.
> > Each bootstrap replicate is a pseudo-sample of the universe of sites. > It should, therefore,not be constrained by only sampling the
> > parsimony-informative sites. This means that in some bootstrap
> > replicates there will be _a lot_ of uninformative sites, and in some
> > replicates there will be _few_ sites. In the subsequent
> > analysis of each bootstrapped dataset, the uninformative sites will
> > not contribute to the phylogeny, however, they should be included in
> > the sampling process.
James O. McInerney email: J.mcinerney at nhm.ac.uk
Molec. Biol. Comput. Officer, phone: +44 171 938 9163
Department of Zoology, Fax: +44 171 938 9158
The Natural History Museum,
London SW7 5BD.
More information about the Mol-evol