David Witherspoon points out that the base composition in a set of
sequences, used by my DNA substitution model, may not reflect the
pool of "available" nucleotides available to be substituted.
> ... I don't think I have such a region. Perhaps 3rd position sites in
> an ORF would approximate this? Or would it make sense to use DNAML
> iteratively to find the base pool that yields the highest likelihood (for
> a given data set and tree)? What if neutral regions seem to drift to
> very high AT content? Does this mean that there are almost no G or C
> nucleotides in the base pool? Or does this phenomenon have something to
> do with the 'ease of misincorporation' differing between nucleotides (and
> could that be absorbed into the base pool model)? Am I putting too fine
> a point on a model that was only meant to be a reasonable approximation
> in the first place?
If you try to iteratively find the base composition that yields the highest
likelihood, it won't be much different. Hasegawa and Kishino have tried this
and it makes little difference.
I think you may be "putting too fine a point" on the model. It was only
intended to be a model that allowed for two departures from the symmetrical
Jukes-Cantor model. It allows for unequal frequencies of transition and
transversion, and for an equilibrium base composition that is unequal and
arbitrary (by the way, it can also be specified by you and not obtained
empirically from the sequences).
As natural selection can accept of reject any substitution, one has to
consider not only the pool(s) of available nucleotides, but the further
skewing of nucleotide composition by this screening process. I certainly
*don't* think that the model of bases faalling out of the sequence and
being replaced by dippings into a pool of nucleotides is to be taken
seriously as a proposed mechanism of mutation!
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
Internet: joe at genetics.washington.edu (IP No. 126.96.36.199)