In response to the continuing discussion regarding the
evolutionary relationship between archaebacteria, eubacteria and
eukaryotic cells (See "Is Carl Woese Losing a Kingdom?") a number
of recent postings (by Jonathan Badger and James McInnerney) have
suggested that the SSU rRNA is the most suitable molecule for
phylogenetic purposes. While it is true that the SSU rRNA is
currently the most widely used molecule for phylogenetic
analysis, this is mainly due to historical reasons and now an
extensive and ever growing data base. However, the use of rRNA
for such studies has a number of pitfalls and/or associated
problems, which are either generally overlooked or that many
people may not be even aware of. Many of these problems
(discussed below) are either not present, or greatly minimized,
in protein based phylogenies, and hence these, in principle,
should be more suited for investigating deep phylogenetic
relationships going back to the beginning of life. Some of the
comparative advantages of protein vs rRNA are as follows:
(i) In proteins, each character has 20 states rather than only
four possible states in the nucleic acid sequences. This
provides more information per site in proteins and greatly
reduces the likelihood of mere chance alignment or multiple
substitutions at a site.
(ii) The G-C contents of organisms vary greatly and this
compositional disparity introduces many changes in the genome
which are evolutionarily not important. Such changes could lead
to a bias in phylogenetic reconstruction and these cannot be
easily identified or corrected for in structural nucleic acids
such as the 16S RNA( see Loomis and Smith, PNAS, 87,9093, 1990;
Hasegawa et al. JME,36, 380, 1993: Steel et al, Nature, 364,440,
1993; Galtier and Gouy, PNAS, 92, 11317, 1995).
As acknowledged by Woese,
"The compositional disparity in rRNA is a persistent
source of difficulty in phylogenetic analysis".
In Evolution at the Molecular Level,
R. Selander et al. (eds), 1991, pp.1-24.
"The problem (of) disparity in base composition is far
more troublesome than is generally recognized and has
almost received no attention to date"
(Olsen and Woese, FASEB J. 7, 113-123, 1993).
By contrast to the structural nucleic acids, in protein sequences
such compositional-induced changes occur predominantly in the
third-codon positions, which because of the degeneracy of the
genetic code have a minimal effect on the encoded amino acid
sequence.
(iii) In the case of 16S rRNA the alignment of sequences from
various species is carried out based upon the assumption that
similar secondary structural constraints apply in very distantly
related species. However, the assumption of constancy of
secondary structure in distantly related species remains largely
untested, to the best of my knowledge. This assumption could
potentially introduce serious bias in the alignment which in turn
could lead to incorrect phylogenetic relationships for certain
species. By contrast, for alignment of protein sequences from
different species, no assumption concerning secondary structure
need be made.
(iv) The length of SSU varies considerably between prokaryotes
and eukaryotes, and show tremendous variation within the
mitochondrial homologs (from 612 nt in Crithidia fasiculata to
1955 nt in Wheat; M. Gray, Biochem.Cell Biol. 66, 325, 1988).
This length variation should be of concern in terms of sequence
alignment and phylogenetic analysis. In contrast, for some of the
highly conserved proteins such as Hsp70, the length in all
species as well as organelles is very similar (excluding the
targetting sequences and a small variable region at the C-
terminal end that is not used for phylogenetic purposes).
(v) The rRNA genes are present in multiple copies in most
(perhaps all) prokaryotic and eukaryotic organisms. The extent of
sequence differences between these genes is presently not clear,
and this is a potential source of problem in phylogenetic
analysis. In contrast, for some of the well characterized
proteins such as hsp70, only a single gene has been found in all
prokaryotes (except E.coli where a gene duplication has taken
place).
(vi) James McInnerney wrote:
> One of the main reasons that rRNA have been used for
> phylogenetic reconstruction is the idea that they are
> directly vertically transmitted and are not subject to
> lateral gene transfer.
Given the multiple copies of rRNA genes that are present in
various organisms, and their conserved function, I am not sure
how one could conclude that they are better than many of the
other genes in this regard.
However, for some of the well characterized and highly
conserved protein such as Hsp70, the possibility of horizontal
gene trasfer or exchange between species can be excluded based on
the distinctive sequence signatures that are present in homologs
from the major groups of prokaryotes. Likewise the possibility of
exchange between organellar and nuclear cytosolic homologs can
also be ruled out based on the distinctive sequence signatures in
the two types of homologs.
Thus, while the rRNA has, and would continue to serve a very
useful role in molecular evolutionary studies, it would be wrong
to conclude they are ideally suited for all types of studies. In
my opinion, it would be even a greater mistake to slight the
protein phylogenies, as they offer several advantages in
comparisons. It is likely that the evolutionary relationships
amongst the deepest branches in the tree of life, as well as the
root of the universal tree, will be clarified by the protein
based phylogenies rather than by the rRNA.
Radhey S. Gupta
McMaster University
P.S. I will respond to Arlin Stoltzfus and Larry Moran's postings
in a few weeks time after my grant writing is over.