IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

informative sites & bootstrap

James O. McInerney j.mcinerney at nhm.ac.uk
Tue May 5 12:31:00 EST 1998


Below we have two answers that appear to be opposed to each other.  I
submitted answer number 2.  I don't know who submitted answer number 1.

There are two very separate issues here.  Firstly, do you leave the
parsimony-uninformative sites in the alignment when you are sampling and
secondly, do you leave the parsimony-uninformative sites in the analysis when
you are searching for a tree?

In my opinion the answer to the first [sampling] question is yes.  You should
leave the parsimony-uninformative sites in the alignment when you are
generating the samples.  The reasons are given in my answer.

To the second question, I think the answer is no.  For precisely the reasons
that were given by the submitter of answer 1.  

Are there any other contributions?  If I am incorrect in my assumptions, I
would like to know (and also the reasons why I am incorrect).  I honestly
don't want to start a cladistic flaming process.



Peter Schuchert wrote:
> to my original question:
> > for a bootstrap analysis, do you use all sites of a sequence
> > alignement or only the informative ones?
> I got the controversial answers given below. Are there more opinions?
> Peter
> 1)
> > Actually, that's a good question.  I presume you're referring to
> > parsimony analyses, in which only the informative sites should be
> > included for both boostrap analysis and for finding the best (maximum
> > parsimony) tree.  If you include sites with autapomorphies, you'll
> > basically be adding a step per autapomorphous site, so your
> > consistency index will be underestimated.  If using model-based
> > approaches for your molecular data (e.g., maximum likelihood or
> > distance) use all the sites that are relevant (i.e., not including
> > those where you have alignment ambiguity and so forth).
> 2)
> > I believe that you should use all sites during bootstrapping.
> > Each bootstrap replicate is a pseudo-sample of the universe of sites.  > It should, therefore,not be constrained by only sampling the
> > parsimony-informative sites.  This means that in some bootstrap
> > replicates there will be _a lot_ of uninformative sites, and in some
> > replicates there will be _few_ sites.  In the subsequent
> > analysis of each bootstrapped dataset, the uninformative sites will
> > not contribute to the phylogeny, however, they should be included in
> > the sampling process.

James O. McInerney               email: J.mcinerney at nhm.ac.uk
Molec. Biol. Comput. Officer,    phone: +44 171 938 9163
Department of Zoology,           Fax:   +44 171 938 9158
The Natural History Museum,
Cromwell Road,                    
London SW7 5BD.                  

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net