rate variation in ML models

Joe Felsenstein joe at evolution.genetics.washington.edu
Sat Oct 11 09:56:59 EST 1997


[I had written about rate variation among sites]
>> if, as is allowed in my DNAML and Yang's PAML, there is some autocorrelation
>> among sites, then the model isn't i.i.d.   This affects, for example,
>> the validity of bootstrapping.  

[Mark Siddall comments]
>And, for that matter, the entire validity of the analysis (Felsentein,
>1973).  But, in any case, so long as DNA data are sequenced, and not
>selected at random, the requirements of i.i.d. would seem to me to not
>be met.  There always is autocorrelation among sites in protein coding
>genes and among some sites in rDNA data.  
>That is to say, we have ample evidence that all is not stochastic, so I
>wonder why there is such widespread acceptance of models that are
>predicated on stochasticity in the first place...

The autocorrelation issue is not fatal to likelihood analysis, when it
makes provision for it.  This Ziheng Yang's PAML does, and my DNAML
does, both using a Hidden Markov Model approach.  One then is not assuming
i.i.d.  

For that matter, many methods (yes, even parsimony) will not be seriously
misled by autocorrelation, even if they do not attempt to correct for it.
For example, you can use distance methods like Jin and Nei's which assume
gamma-distributed rates.  This does not correct for autocorrelation but
that will not bias the results one way or the other.  Similarly for
likelihood methods (as in PAUP*) that correct for gamma-distributed rates.

It _may_ cause trouble for bootstrapping, but there is a "block-bootstrap"
method by Hans Kuensch that can handle that.


As for the criticism that stochastic models are inappropriate, it might
hinge on whether we can regard the events in a Mendelian segregation
leading to genetic drift as "really" random, and whether unknown
episodes of natural selection causing substitutions can be regarded
as random.  Therein lie deep philosophical waters.

The criticism would lead me to wonder what deterministic models Siddall
has in mind, but I believe that he does not want us to use _any_ models.

-- 
Joe Felsenstein         joe at genetics.washington.edu
 Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA




More information about the Mol-evol mailing list