I know that I didn't respond to Dr. Gupta's posting about
chimaeric theories early enough (I was finishing my thesis).
But I would like to reply...or at least comment on the
general arguments that have been made.
The problem is that in addition to phylogenies which
show the sisterhood of eukaryotes and archeabacteria, there
are a number of non-congruent protein phylogenies which
sometimes show a relationship between gram positive bacterial
and archaebacterial sequences (e.g. hsp70, gln synth)
OR sometimes show a relationsip between gram negative bacterial sequences and
eukaryotic ones (e.g. FGARAT).
This is a fact that both Gupta and I agree on. However,
we differ radically on our interpretations of this fact.
Several other facts about the "non-congruent" protein
dataset need to be explained:
1) why are there multiple phylogenetically distant isoforms
of these proteins in single organisms or groups?
2) if there are such isoforms, then which one tracks the
the host organism's phylogeny?
3) why are some organismal groups, thought to be either mono or
paraphyletic, NOT reconstructed as such in some of these trees?
4) where does the root fall in these trees.
Broadly, my interpretation of facts 1-3 is that there is
either paralogy or lateral gene transfer happening. I do
not see how two phylogenetically related sequences can
be in the same organism without invoking one of these
If paralogy is involved, then one must ask whether, for
each of the paralogs, have the relevant taxa been sampled
(have several gram positives, negatives, archaes and eukaryotes
been sampled)? If so, then perhaps we can make some generalizations
about organismal phylogeny from each of the mirror trees and not
worry about the confounding influence of paralogy.
If lateral gene transfer is involved, then one must sample all
of the taxa near the aberrantly treeing organism to see if one
can localize when and between who the event has occurred. Once this
is done, then again we can all agree that the non-transfered gene
in the organism may be indicative of the organismal phylogeny.
As far as I know, for GS, GDH, AspAT, FGARAT, proC, we do not
have all of the necessary information mentioned above to decide
what is paralogy and what is lateral transfer and what is organismal
My central point is that until we have this information we cannot
say that these phylogenies speak to a particular organismal relationship.
We must first tease apart the paralogies and lateral transfers
that confound such an inference. Remember, WE KNOW THAT ONE
OR THE OTHER OF THESE PROCESSES MUST HAVE HAPPENED IN THE
HISTORY OF THESE GENES (the argument from multiple isoforms described above).
Issues, 3-4 bear on these datasets too, but become particularly
relevant when discussing hsp70, the cornerstone of Gupta and Golding's
Firstly, the G&G scenario imlplies that two kinds of trees
of organismal relationships should be found:
tree 1: (Grm+Ves,Grm-ves)(Archeabacteria, Eukaryotes)
this is the conventional view...and:
tree 2: (Archaebacteria,(Gm+ve,(Gm-ve,Eukaryotes)))
which indicates a the grm negative eubacterial contribution
to the nuclear genome.
(Note, I am implying a root with these trees).
On this hypothesis one should NOT expect the root to fall
such that these are the trees:
tree 3: (Archaebacteria, Gm+ve),(Grm-ve,Eukaryotes)
tree 4: ((Archaebacteria, Gm+ve),Grm-ve),Eukaryotes)
tree 5 ((Archaebacteria, Gm+ve),Eukaryotes),Grm-ve)
tree 6: (Grm+ve,(Archaebacteria,(Eukaryotes, Grm-ve)
Notice that trees 3-6 and tree 2 are identical if there is no root
on the tree, but they are very different if there is a root.
Tree 2 implies that archaebacteria diverged from a common ancestor
of all other groups-- they are not specifically related to any of the
By contrast, all other trees suggest that archeas are not the most
deeply brancing lineage.
My claim is that the hsp70 data is MORE consistent with tree 4 than
with tree 2. This is because using pairwise distances archaebacteria
do not appear to be the most divergent group- they seem closest to
Part of the problem with Gupta's perspective seems to be that he
is wedded to viewing everything as a four taxon problem....if
there is no root then tree 2 and 4 are the same. But we should
always remember that such an unrooted tree can imply MANY different
relationships (trees 1-6). I think that #4 is the most consistent
with the actual similarity of the genes. But if rates are a
problem, as Gupta claims, then WE CANNOT distinguish between
any of the relationships depicted in trees 1-6 (there will be
no way to place a root on the tree objectively).
A second problem with the hsp70 dataset is that the archeabacteria
are not monophyletic-- they appear polyphyletic. This has nothing
to do with the Eocyte hypothesis of Lake and coworkers-- there are
no Eocytes/Crenarchaeotes in the hsp70 dataset and if there has
been a lateral transfer from gram positives to archaes as I suggested
(and Gogarten suggested this earlier than Jim Brown and I), I
think that a Crenarchaeote homolog won't be found or
(has been lost) or is very distant to the euryarcheote ones, in support
of tree 4 (where you just replace archaebacteria with euryarcheote
and you put crens with the eukaryotes).
Now, what Gupta and Golding might be finding is a consistent
tendency for archeabacteria and gram positives to go together-
there may have been a lot of transfer from the latter to the
former...but this is hardly the same thing as finding a consistent
relationship between euaryotes and gram negatives (tree 2).
Finally, it should be mentioned that we ALREADY know that most
eukaryotes have had a huge gene invasion from the gram negative
proteobacterial endosymbiont that gave rise to mitochondria.
So we expect to see a gm-ve/eukaryote grouping in this case.
The specific relationship should be with the alpha-proteobacteria
in this case. SO, IF there is a signal in the datasets
of this sort, we can test whether its a mitochondrial gene
(eg FGARAT could be such a case--but there is better evidence
for paralogy for this protein). One should note that the simple
removal of mitochondrially-targetted sequences from consideration
is not adequate to control for this. There are quite a few cases
where the gene of a compartmentalized enzyme clearly came from
the nuclear lineage (GS in Drosophila), and there are also cases
where a cytosolic enzyme may have been replaced by an organelle-derive
homolog (PGK in higher plants).