IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Bias and likelihood

Mary K. Kuhner mkkuhner at kingman.genetics.washington.edu
Wed Jun 2 15:00:27 EST 1999

Jean-François Martin  <jfmartin at newsup.univ-mrs.fr> wrote:
>In butterfly mtDNA, the composition bias is extreme toward A-T (80 to 90%
>depending on gene and codon position).
>It seems also unlikely that every kind of substitution has equal probability
>to occur.
>Furthermore a selection against substitutions providing G and C, which has
>been demonstrated in Dloop of mammalians (A-T rich), is not correctly
>represented by ML models. At least for what I know about Maximum Likelihood
>and PAUP* options, it is impossible to use a non reversible model. Even if
>it was possible, what kind of weighting sheme could fit to the actual (not
>the observed) substitution pattern?

If you are trying to analyze only butterflies (and not adding in
other taxa with a more normal AT content) I think you could
reasonably use one of the models which allows for unequal nucleotide
frequencies (in PAUP* or Phylip).  This is equivalent to assuming that
the high AT content has been in a steady state for a long time.  You
would also use a high transition/transversion ratio to represent the
fact that transversion substitutions are disfavored (whether they
just don't happen, or happen and are selected against, we don't need to
know for this purpose).  Within the group of AT-rich taxa you should
be able to get decent results without needing a non-reversible model.

If the AT content changes over the course of your tree, I'm not sure
there are any good answers available.  You might consider methods which
assemble the tree from quartets, on the grounds that quartets of
(mostly) closely related taxa will have similar AT content and 
therefore the quartets may be recovered correctly whereas the full
tree would not.

Programming ML with a non-reversible model is...daunting.  (Running
the resulting program would probably be daunting, too:  I'd expect
it to be horrendously slow.)

I'm not sure what you mean by "fit to the actual (not the observed)
substitution pattern."  *Any* method is going to have to try to infer
the substitution pattern:  if we knew the actual substitutions we
wouldn't need ML in the first place.

Hope this helps.

Mary Kuhner mkkuhner at genetics.washington.edu

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net