Making alignments

Nick Goldman N.Goldman at gen.cam.ac.uk
Tue Jan 20 11:32:34 EST 1998


James McInerney wrote:
> 
> > I've just looked at a chapter by Nick Goldman:
> > Goldman N. Phylogenetic estimation. In: Bishop
> > MJ, Rawlings CJ, eds.  DNA and Protein Sequence Analysis.
> > A Pratical Approach. Oxford: IRL Press, 1997:(Rickwood D,
> > Hames BD, eds. The Practical Approach Series; vol 171).
> >
> > He writes (p297): 'DNA sequences must contain more information than
> > amino acid sequences and phylogenetic estimation methods based
> > on DNA are generally better developed than for amino acids.
> > Consequently, I recommend the use of DNA sequences whenever the
> > choice exists.'
> >
> 
> Yes, I read this also and I thought it was a bizzare thing to say.  Given that
> convergences in base compositional terms (two very distant sequences converge
> on a similar base composition) are widespread and also that very quickly,
> synonymously-degenerate third positions become saturated with substitutions, I
> cannot recommend the use of DNA sequences when these sequences can be
> translated into proteins.  I wish that he had elaborated on this sentence

Please recall that I was writing about phylogenetic estimation, and not
about sequence alignment.

I stand by my comment that DNA sequences contain more information:  you
can translate DNA --> amino acids, but not vice versa without
ambiguity.  Therefore, all other things being equal, it would be better
to base analyses on the DNA sequences.

If you are worried about (e.g.) saturation at third codon positions,
then a well- developed model of DNA evolution should allow you to
retreive some of whatever phylogenetic information IS left, whilst
simultaneously incorporating the information from other positions and
automatically adjusting the "weight" given to each in the appropriate
manner.  In this example (saturation at some sites), models/methods that
could do this could be like those (1) in PHYLIP's DNAML, using the
methods that allow for different rates at different positions, or (2) in
Ziheng Yang's <A HREF="http://abacus.gene.ucl.ac.uk/ziheng/paml.html">
PAML </A> package (similar sort of thing), or (3) in the method devised
by Goldman & Yang (1994:  Mol. Biol. Evol. 11:725--736) which has a
61-state model with each sense-codon treated separately (also
implemented in PAML).

Phylogenetic analyses from amino acid sequences tend to be less
well-developed, particularly in the models of sequence evolution they
consider (typically, no rate heterogeneity amongst sites; less realistic
description of replacement patterns).  There are exceptions, but far be
it from ME to plug more recent work by Thorne, Goldman & Jones and
Goldman, Thorne & Jones.

Hoping this explains what I was trying to say without getting me into
the usual flame wars,

  Nick Goldman

-----------------------------------------------------------------
  Nick Goldman                e-mail:  N.Goldman at gen.cam.ac.uk
  Department of Genetics
  Cambridge




More information about the Mol-evol mailing list