Does anyone have experience with, or opinions about, calculating
distances between distantly related sequences?
I have a group of 11 homologous amino acid sequences, 242 to 290 a.a.'s
long; one is bacterial, the rest are eukaryotic. I'm trying to get an
idea of whether the bacterial sequence arose by lateral gene transfer
or by direct evolution. Sequence distances were calculated with
Felsenstein's PROTDIST (part of the PHYLIP package) using the Dayhoff
PAM matrix.
The average distance between vertebrate and invertebrate sequences is
95 +/- 3 PAM. Assuming these proteins are orthologous (they probably
are), and that the vertebrate/invertebrate divergence occurred 600 Myr.
ago, the substitution rate is 158 PAM/Gyr. The average distance from
the bacterial to the eukaryotic sequences is 415 +/- 30 PAM, which,
when plugged into the substitution rate gives a divergence time of
2.6 Gyr. This is well short of the estimated 3.5 Gyr since the
prokaryote/eukaryote divergence.
BUT, how accurate is the 415 PAM? There are enough conserved residues
to ensure the alignment is accurate; but I am concerned whether, given
an average sequence identity of 15%, the PAM matrix can accurately
compensate for superimposed substitutions. Is the distance really 600
or 800 PAM?
Anyone care to venture an opinion?
Paul Berti
Biotechnology Research Institute
National Research Council of Canada
Paul.Berti at BRI.NRC.CA