Invariable sites question

James McInerney j.mcinerney at nhm.ac.uk
Mon Nov 25 12:07:04 EST 1996

```Korbinian Strimmer wrote:

> OK, but the problem is which sites are the invariable sites.  The
> likelihood method (as e.g. used in the HKY 1985 paper) uses a
> parameter f corresponding to the probability that a given site is
> invariable (= fraction of invariable sites among all sites).
> In this way all positions are examined and the problem of
> selecting constant sites is circumvented.  The value for f that
> maximizes this likelihood function is the ML estimate of f and
> is smaller or equal than the fraction of constnt sites.
> In theory, if you would know the constant sites you could drop them
> and you'd get f = 0.0. In practise, you don't know, and you have
> to live with the complete alignment and f > 0.0.
>

>
> I don't understand what these entries are but the parameter usually
> used is the fraction of invariable sites among all sites
> (what you have is the conditioned probability of beeing invariable
> given that a site is constant)
>

Yes, you're correct of course.

> > What about when you are bootstrapping the dataset?
>
> I guess you are talking of bootstrapping ML trees (you intrduction!).
> For the ML you estimate your f parameter once with the complete
> data set.  Then you simply do bootstrapping with the whole data
> set and with a fixed f. Don't remove sites unless you know for
> sure that they are invariable (if f = fraction of constant sites
> you can remove all constant positions, of course).
> If you are of NJ trees than you should take care that you have
> maximum likelihood distances where f is incorporated.
>

I'm afraid that you may not be absolutely correct that it only applies
to distance methods that have f incorporated (although, obviously you
have a much greater knowledge than I of this subject).  With the latest
version of PAUP you can 'remove' (mathematically, not physically) a
proportion of invariable sites (which must be calculated by ML), for ALL
pairwise distance methods.  This includes LogDet. According to Dave
Swofford, the program estimates the number of invariable sites from the
raw dataset.  Then following each bootstrap resampling, the number of
constant sites in the new dataset is reduced by the appropriate amount.

However, the number of invariable sites is calculated only once.  Is it
not more appropriate to (physically) remove a certain portion of the
constant sites (so that f=0.0) BEFORE carrying out bootstrapping?  It
shouldn't matter which sites are removed, just as long as the correct
number of sites were removed.

> Finally, I'd like to mention the practcal problem: though the
> ML method with invariable sites is old (1985) I don't know
> how to do it in practise with any published program (please
> tell me if you know such a ML estimation program!)

PAUP*, the next version of PAUP?

Hope I've made my question clearer.

James

```