NA vs. AA phylogenies

Andrew J. Roger aroger at
Sun Jan 21 14:15:25 EST 1996

Ulrich Melcher <umelcher at> wrote:
>	    Molecular phylogenies can be constructed from optimal 
>(algorithm-defined) alignments of amino acid sequences or of nucleotide 
>sequences.  Some folks construct phylogenies based on each.  The two 
>phylogenetic trees may differ from one another.  Novices will often 
>interpret these differences with regard to coding vs. non-coding 
>changes, while experts will recognize that the alignments on which the 
>trees are based are not necessarily equivalent.
>     Is there software that can:
>1) Given an amino acid sequence alignment and the raw nucleotide 
>sequences that translate to the amino acid sequences, produce a 
>nucleotide sequence alignment that is equivalent to the amino acid 
>sequence alignment; and/or
>2) Given a nucleotide sequence alignment, translate the constituent 
>sequences into aligned amino acid sequences, so that the resulting amino 
>acid sequence alignment is equivalent to the nucleotide sequence 

Thanks for asking this question. Its going to save us
a lot of time with access to this software!!

Dare I suggest we now discuss the relative merits of 
amino acid versus nucleotide phylogenetic analysis
on coding sequences?  

My feeling is that currently we have programs which
use empirically derived substitution matrices for
amino acids (for distance we can use Joe's PROTDIST,
for likelihood we can use PROTML or AAML (from Yang's
PAML program) and thus we are more likely to be 
basing our analyses on reasonable assumptions about amino
acid evolution. However, it is common to see the use
of nucleotide programs with coding sequence. Typically
people get rid of third base positions for deep
phylogenetic questions....However, using DNADIST,
or DNAML with settings meant for non-coding sequences
(for instance using Kimura's two parameter model with
a Ts/Tv ratio > 1) I believe to be inferior to using 
amino acids. 
My reasons are:
Coding sequences will evolve in accordance with
the following constraints:
The ease with which one codon will change to another will 
depend upon:
				         -the chemical properties of the coded amino acid
             -which codons it is synonymous with
             -what kinds of mutations are more frequent than others

The different codon positions will be differently susceptible to
each of these forces:
-the first position will be largely constrained by amino acid coding
-second position will almost entirely be constrained as above
-the third position will be subject to mutational biases within a
framework of synonymous codons

Thus until an empirical model of codon evolution is implemented
(there is currently one that I know of: it is the CODONML
program in Yang's PAML package) in distance and ML methods,
amino acid level analyses are preferable (to me at least).

Does anyone have any opinions on this matter?

Andrew J. Roger
aroger at

More information about the Mol-evol mailing list