The replies that I have got by email and posted here mostly
agree that amino acid sequences should be used for making
the alignment of coding regions. This alignment can then
be converted into the corresponding nucleotide alignment
for making trees. This makes biological sense to me.
I've just looked at a chapter by Nick Goldman:
Goldman N. Phylogenetic estimation. In: Bishop
MJ, Rawlings CJ, eds. DNA and Protein Sequence Analysis.
A Pratical Approach. Oxford: IRL Press, 1997:(Rickwood D,
Hames BD, eds. The Practical Approach Series; vol 171).
He writes (p297): 'DNA sequences must contain more information than
amino acid sequences and phylogenetic estimation methods based
on DNA are generally better developed than for amino acids.
Consequently, I recommend the use of DNA sequences whenever the
Also, concerning indels or gaps, he writes (p 283):
'A gap is difficult to interpret in evolutionary terms, and there
are no reliable methods which can use the information held in patterns
of gaps. Some phylogenetic analyses are able to use the nucleotides
or amino acids present at positions where some sequences have gaps;
others cannot, and those positions must be discarded from the
data to be analysed, even if only one sequence haas a gap.'
So, should I strip my alignment of gaps? and can anyone recommend
programs to do this?
Another question. I've heard that a sequence dataset needs a minimum
of 20 informative positions to be usable.That is, any position
where at least one sequence has a different residue. Any comments?
Regarding converting the aa alignment to the corresponding nt
alignment, Warren Gallin recommends 'using DNA Stacks,
a freely available Hypercard program written by Doug Eernisse.
The last web page address that I have for this program is:
and James O. McInerney recommends:
'If you want to semi-automate this task, get a copy of
clustalw (ftp.ebi.ac.uk) and you can get two programs
from the Natural History Museum ftp site
(ftp://ftp.nhm.ac.uk/pub/gcua) called translfas and
putgaps, which will translate the DNA sequences (in fasta
format) and then after alignment you can use putgaps to
insert gaps in the dna sequences according to where they
are found in the amino acid alignment. It's a bit of a