In article <DIx4xr.F9t at gpu.utcc.utoronto.ca>, lamoran at gpu.utcc.utoronto.ca
(L.A. Moran) wrote:
>> There are several reasons why GS can be a better marker than 16S RNA.
>> 1. You can use amino acid seqeunces in the alignment and analysis.
> This avoids the compositional bias of nucleotide sequences.
> 2. The entire sequence can be aligned - you don't have to discard
> some ambiguous sequences as you do with 16S RNA.
> 3. The GS gene is more highly conserved than 16S RNA.
> 4. There are parologous GS genes and that allows the universal
> tree to be rooted.
Well, I want to join the rRNA good vs. rRNA bad debate - so here goes.
I agree that there are potential problems with SS-rRNA based trees.
However, there are many reasons why SS-rRNA trees should be good
including: lack of parology, low likelihood of lateral transfer, the
ability to use secondary strucutre to guide alignments even in regions of
low sequence conservation, and the enormous number and diversity of
sequences avaialable. Yet despite all of this, it is possible that some
parts of the trees derived from rRNA sequences are inaccurate. Why else
would trees of so many other genes show at least some differences with the
rRNA trees?
Well, as D. Edgell has pointed out, differences between the trees of
some gene and the trees of ss-rRNAs could be due to poor species
sampling. When we compare the trees of a particular gene (such as GS) to
those for rRNA, there are many reasons why the trees may be different
including:
1. different histories of the genes (could be due to lateral transfers,
gene duplications, etc.)
2. inaccurate trees for one or both of the genes (could be due to poor
species sampling, converegent evolution, bad alignments, etc.)
I believe that one way to try and determine what causes differences
between trees of different genes is to generate trees for the two genes
using similar methods and similar sets of genes (from the same species).
This removes potential problems due to different speices sampling,
different numbers of genes, and the use of different techniques. Such a
comparison has been done by Ludwig et al. (Antonie van Leeuwenhoek 64:
285-305. 1993) for Ef-TU and ATP-synthase-B. They compared these trees
with trees of SS-rRNA genes from similar species.
I have done the same thing with RecA (J. Mol. Evol. in press). For
the RecA analysis, I generated trees of the 65 available complete RecA
protein sequences using parsimony and distance techniques. I also
generated trees for SS-rRNA sequences from essentially the same species (a
few species were represented by close relatives to the species in the RecA
analysis), using similar techniques (the only significant difference was
for the calculation of distances for the distance based techniques).
Overall, the trees for the two molecules have similar topologies
(especially when comparing trees generated by similar techniques) and they
also have similar resolution for specific phylogenetic relationships. Thus
the RecA trees support the general patterns of bacterial evolution found
in the SS-rRNA trees. Of the differences between the trees, most are in
areas of low resolution for one of the trees or both (e.g. as indicated by
low bootstrap values). Therefore these differences are probably not due
to different histories of the genes. However, there are some difference
in areas of "good" resolution. I can see two causes for these: 1: true
differences in history of the two genes (e.g. perhaps some recA genes have
been transfered between species) and 2: misleading indications of "good"
resolution. For example, bootstrap values are used by many people,
including myself to get an idea of the reliability of a particular
branching pattern. However, if there have been directional changes across
an entire molecule (such as could be due to GC content convergence) then
one could get high bootstrap values for a pattern that does not represent
the true history of a gene. I believe that an example of this in my
analysis is in the position of the sequences from Thermotoga maritima.
The RecA and SS-rRNA trees are different for the position of the
respective genes from Thermotoga maritima. With the species I analyzed,
the T. maritima SS-rRNA branches very deeply within the bacteria (the
trees were rooted to sequence from Aquifex pyrophilus, and the T. maritima
rRNA is the next deepest branching species, wioth high bootstrap values).
The exact position of the T. maritima RecA, is unresolved (it's different
in parsimony verses distance techniques), however it always branches above
the Deinococcus-Thermus group. In this case I believe that the deep
branching of the T. maritima rRNA may be due to GC content convergence as
suggested by L. Moran.
To summarize, 1) trees of RecAs and rRNAs from the same species are
very simila; 2) therefore either both trees are being misled by the same
factors or both are accurate 3) the use of sequences from the same sets of
species allows one to determine whether differences between trees are
significant and (possibly) what might cause those differences.
Jonathan A. Eisen
--------------------------------------------------------------------
! Jonathan A. Eisen 415-723-2425 (lab) !
! Department of Biological Sciences 415-725-1848 (fax) !
! Stanford University 415-497-0599 (home) !
! Stanford, CA 94305-5020 jeisen at leland.stanford.edu !
--------------------------------------------------------------------