rate variation in ML models
Andrew Rambaut
andrew.rambaut at zoology.oxford.ac.uk
Thu Oct 2 10:07:00 EST 1997
> sorry for this naive question. I was wondering how the discrete gamma
> distribution is factored into likelihood calculations.
>
> Are particular sites assigned a rate category in advance OR
> does every site have a certain probability of being in one
> of the rate classes (thus the calculations for each rate category
> are done for each site and summed over the whole lot- low
> probability assignments then contribute little to the overall
> probability, and high probability assignments contribute
> a lot to the overall probability).
The likelihood at each site is calculated for each rate category
and then multiplied together. Thus the likelihood is integrated
across all possible rates (given the discrete approximation). The
categories are calculated such that a site has an equal probability
of being in each (i.e. 0.25 with 4 categories). Each category is
then represented by the mean or median alpha (shape) parameter.
> Are all among-site-rate-variation models incorporated into
> the calculations the same way?
The categories can be assigned any rates, the gamma just produces
a convenient set of rate distributions, the shape of which can
be described by a single parameter (as opposed to N-1 if you
assign N arbitary rates).
> Finally, does the way that one deal with this problem affect
> whether or not the i.i.d. assumption holds for a dataset?
No - each site is still independent and is evolving under the
same model (even though this model has been expanded to allow
rate variation). It is possible to produce models in which adjacent
sites have similar rates (see Yang's recent work).
For a review see Yang (1996) in TREE 11: 367-372
Andrew
