DNA vs amino acid sequences
ttha at uhura.cc.rochester.edu
Mon Jun 12 08:05:31 EST 1995
In <3rghpv$hm6 at nuscc.nus.sg> mcbbv at leonis.nus.sg (Venkatesh Byrappa) writes:
> I am now in the process of generating a phylogenetic tree of actin
>sequences from some lower
>vertebrates. I am wondering which sequence I should use - DNA or amino
>acid? Which is more appropriate for a highly conserved protein like the
> Can someone lead me to recent publications that compare the
>pros and cons
>of using DNA vis-a-vis amino acid sequences for generating phylogenetic
I have published trees for tubulin, HMGs, and histones (all from the
lab of M. A. Gorovsky). In all cases I used the amino acid sequences,
for the following reasons:
(Note that these reasons apply mostly to highly conserved proteins
found in species that diverged long ago--like actin. For faster-
evolving proteins the rules are probably different.)
1) CODON BIAS: I was comparing all known sequences, including those from
yeast, protozoa, plants, and vertebrates. It is known that yeasts,
protozoa, and animals have different codon preferences, which would
result in differences in DNA sequence related to codon bias and not
to evolution. Also, the protozoa use the codons TAA and TGA to encode
glutamine, rather than STOP. The inclusion of unique codons in a
subset of the sequences will tend to make that subset appear more
divergent than they really are.
2) LONG TIME HORIZON: I was comparing sequences that have been diverged for
possibly a billion years. In that time, it is very likely that the
wobble bases in the codons will have become randomized. If you exclude
the wobble bases, then you are really looking at amino acid sequence
3) INTRONS: A DNA sequence comparison should only include coding sequences.
I decided in the interest of time and sanity that I would not go into
the DNA sequences and edit out all the introns in every sequence.
4) MULTIGENE FAMILIES: Humans contain who knows how many histone genes,
but only one peptide sequence for H4 has ever been identified in
humans. If you do DNA sequences, then which genes do you include?
How do you know they are all expressed? If all the H4 genes that are
expressed encode the same protein, then are DNA differences significant?
5) PROTEIN IS THE UNIT OF SELECTION: For protein-encoding genes, the object
on which natural selection acts is the protein itself. The underlying
DNA sequence reflects this process in combination with species-specific
pressures on DNA sequence (like the need for thermophiles to have DNA
that is resistant to melting). If function demands that a protein
maintain a specific sequence, there still is room for the DNA sequence
to change. (see #1).
My recommendation is, if you can, do the trees both ways and see how they
look. For a group of species that are relatively close in time and closely
related (like all vertebrates) DNA is probably a good way to go, since you
avoid problems 1 and 2. But check the protein anyway. Be aware of the
problems of multigene families and be careful when you decide to exclude
or include sequences.
If you start broadening your tree to include plants or fungi, or even more,
protein is probably better.
Tom Thatcher | You can give a PC to a Homo habilis,
University of Rochester Cancer Center | and he'll use it, but he'll use it
ttha at uhura.cc.rochester.edu | to crack nuts.
More information about the Mol-evol