HSP70 phylogeny

G. Brian Golding golding at mcmail.cis.mcmaster.ca
Thu Dec 22 10:08:30 EST 1994

There have been several recent postings regarding phylogenies based on
the HSP70 genes and because there have been several misinterpretations
(in my opinion anyway) We felt that a response was necessitated.

The discussion began with a question by Dr. Paul Lepp ...

     In article <PWLEPP-151294102114 at schmidt2.mph.msu.edu>, PWLEPP at rrn.mph.msu.edu (Paul
Lepp) says:
     >A few months ago there was a discussion here about phylogenies based on
     >heat shock proteins and a number of other proteins.  From what I remember
     >it seemed that the argument was made that eukaryotes could have arisen from
     >the fusion of a gram pos. and an archea.  Could someone be so kind as to
     >pass along a few cites on the HSPs and the gram pos./archea  connection. 
     >Much appreciated.
     >Paul Lepp

To this posting there were several responses by Dr. Larry Moran and Dr.
Peter Gogarten.  In one of his responses Dr. Moran questions the relationship 
of HSP70 proteins and MreB proteins...

     I have a problem with this. According to the published alignments
     of MreB and HSP70's there is no homology. For example, Gupta and
     Golding (1993) show an alignment of about 340 aa's with about 40
     identities and 9 gaps.  (The number of identities depends on which
     MreB or HSP70 sequences are looked at.) This corresponds to about
     12% similarity and a normalized alignment score (NAS) of about 50.
     According to Doolittle, NAS has to be well above 200 (!) in order
     to claim homology. It looks to me like there may be an ATPase
     binding motif that has arisen independently in many proteins by
     convergent evolution. (Most of the MreB and HSP70 similarities are
     in short stretches of sequence that are involved in ATP binding.)

Dr. Gogarten has already publically responded that Dr. Moran's opinion
is, in fact, in error.  He demonstrated this using distance matrices,
however it should also be noted that it can be shown in many other
ways.  First in the original publication showing this (Gupta and Singh,
J.Bact.174:  4594-4605, 1992) the statistics were actually presented
along with the alignment.  There it was shown that "the statistical
significance of [the] similarity between any two sequences was
evaluated using the RDF2 program" of Pearson, and that the HSP70 ATPase
domain had significant similarity with the MreB protein via this test
-- for example, the human HSP70 ATPase domain/_E.coli_ MreB alignment
score is 11.4 standard deviations above the mean shuffled scores.  In
this paper, the alignment of _H.marismortui_ HSP70 and _E.coli_ MreB is
labeled with "27.7% identity" and "51.2% similarity".  (Statistics are
also presented in this paper to prove the assertion that there is
unusual similarity between the first and second quadrants of the
_H.marismortui_ and _B.subtilis_ HSP70's), which is also supported by
 the crystal structure data on HSP70 (see Gupta and Golding ,J.Mol.Evol. 
37:573-582, 1993 for relevent references and discussion).  If desired the 
similarity between the HSP70 and MreB sequences can also be seen with a dot 
plot though only via "similar" residues and not identical ones.  But perhaps
the easiest way to see the similarity of the sequences is simply to do
a BLAST search of the MreB protein.  There are sufficiently few MreB
proteins that they will not swamp out even the default report and you
will find that HSP70 proteins are again listed with strong statistical
similarity to MreB.

Hence, there are many simple ways to demonstrate the similarity between
MreB and HSP70.  One could, of course, claim that this is only
similarity due to convergent evolution and not true homology but this
would be an unprecedented conclusion.

Dr. Gogarten states ...

     As the HSP70 homologues suggest relationships very distinct from
     ATPases (and 16S rRNA) I had a closer look at the data.  The
     sequences show very convincingly a close association between HSP70
     homologues from gram positive bacteria and from archaebacteria or
     Archaea (so far only sequences from two Euryarcheota).  The
     obtained tree are similar to the glutaminsynthetase data (see
     citations below).

     Concerning the close relationship between gram negative bacteria
     and eukaryotes I think the authors and others completely mis- (or
     over-)interpret the data.  The trees they Gupta et al. calculated
     are all unrooted.  If one uses midpoint rooting (i.e., one assumes
     a molecular clock) the root is placed between the eukaryotes on
     one side and all the prokaryotes on the other side (This was done
     by Sharon Shtang in here dissertation at the Univ.  of Toronto).

It is generally a bad idea to assume molecular clocks.  This has been
amply and completely demonstrated in extensive detail and with
theoretical justifications by John Gillespie.  (He has had many
publications in this area, but an excellent source for more information
is his book "The Causes of Molecular Evolution" 1991, Oxford Univ.
Press).  Phylogenetic techniques such as UPGMA which assume a molecular
clock have been shown to be _much_ poorer than neighbor joining which
does not.  This is why Iwabe _et al_ did _not_ use a molecular clock to
root "the tree of life" but rather used duplicated genes to find the
root without having to assume a molecular clock. However, if the eukaryotic
cell nucleus is a chimera as suggested by our data, then this rooting could 
also lead to misleading results (discussed in Gupta and Singh, Current Biol. 4:
1104-1114, 1994).
We would urge readers of this news group who are inexperienced with
phylogenetic reconstruction not to assume that a molecular clock will
hold for their data.  It is generally a bad assumption and should be
tested.  It is a particularly bad assumption when taxa have been
separated for long periods of time (as in this case).

Dr. Gogarten continues ...

     The same result is obtained if one uses an 'outgroup':  The front
     half of the HSP70 homologues is homologous to the MreB proteins of
     E.coli and Bacillus (i.e. a gram positive and a gram negative). We
     (Elena Hilario and I) did some so far unpublished analyses including
     these proteins in a phylogenetic analyses of the HSP70 homologues.
     Using parsimony, distance matrix or maximum likelihood analysis, the
     MreB proteins always group between Eukaryotes on one side and all the
     prokaryotes on the other.  As far as I can see there is no indication
     in the HSP70 data what so ever indicating a close association between
     gram negative bacteria and eukaryotes.

The use of MreB as an "outgroup" to root the HSP70 tree is not new and
was considered by us more than two years ago.  However, when we did
this we found several problems.  First, although MreB and HSP70 are
homologous, many different potential alignments can be constructed
depending on gap penalties and mismatch penalties.  This is not
surprising since they are distinct genes which probably diverged within
the common ancestor of all living organisms.   While all of these alignments
showed a gap in the MreB sequence in the same place where additional
amino acids are found in eukaryotic and gram negative HSP70s, there was
no rational way to determine as to which of these alignment is the
correct one.  (Depending upon the alignment that one uses, somewhat
different results between the species might be obtained).  Using the
published alignment between MreB and HSP70 (Gupta & Singh 1992; Gupta &
Golding 1993) we had carried out detailed analysis on two MreB protein
sequences and a selection 18 Hsp70 protein sequences (truncated to the
same length as the short MreB protein).  The sequence data, alignment
and the results of these analyses are freely available to anyone but
they are rather long and so we will not waste bandwidth here.  If you
would like a copy simply sent e-mail to "Golding at McMaster.CA" and we
will send you the actual output of the programs.  The sequences of these 20 
proteins were bootstrapped and then analyzed by neighbor joining, by protein
parsimony and by maximum likelihood.  The results via neighbor joining
were that the MreB proteins cluster with the gram positive HSP70
sequences but not with any statistical reliability.  When these
bootstraped sequences were analyzed using protpars the MreB's clustered
within the archaebacteria HSP70 sequences.  For the maximum likelihood
trees, five alternatives had been tested including a clustering with
archaebacteria, with gram positives, and with eukaryotes.  Like the
neighbor joining, the likelihood algorithm preferred the clustering
with the gram positives.  But again, there was not any statistical
significance to this result.

The lack of statistical significance is not surprising considering the
great distance involved between two distinct genes.  It is for this
reason that we did not present a rooted tree -- we have no statistical
validity to base it on.  Additionally, the trees did not require a root
to demonstrate that they are unusual.  Whether or not they are rooted
the trees are still unusual.  The algorithms are not capable of
constructing reliably rooted trees but they are capable of constructing
good unrooted trees and these demonstrate the unusual nature of the

Borrowing from Dr. Gogarten's own tree, but remembering that it _must_ be
considered unrooted, the phylogeny is ...

           |--------- eukaryotic HSP70 hom. 
           |     |--- gram negative bacteria     
                 | * |-- gram positive bacteria
                     |-- archaebacteria

(We have added an asterisk to indicate that this branch length _is_
statistically significant).  The tree that we have found and the tree
that Dr. Gogarten has found for HSP70 is an unusual phylogeny for the
major groups of life and is not a branching order that is commonly

As pointed out in our publications (see below), one striking feature of
the HSP70 family of sequences is that in contrast to the homologs from
gram-positive bacteria and archaebacteria, all of the homologs from gram
negative eubacterial and eukaryotic species contain a relatively
conserved stretch of 23-26 extra amino acids in precisely the same
position in their sequences.  We have presented detailed arguments that
these extra amino acids constitute an insert that took place only once
in the common ancestor to gram negative bacteria (see Gupta & Singh
1992; Gupta & Golding 1993).  However, Dr. Gogarten states ...

      The insertion that according to Gupta et al unites gram negative
      eubacteria and eukaryotes appears in our analysis as [a deletion]
      that unites gram positives and archaebacteria.

and Dr. Moran states ...

      That's what our data shows. Gupta and Golding claim that the
      ancestral HSP70 gene contained a gap and that gram negatives and
      eukaryotes acquired an insertion.  This makes no sense and can't
      be reconciled with their HSP70 dendrogram.

In fact, whether or not it was a deletion or an insertion, the indel can
only be reconciled with our unrooted tree.  Either way a close
relationship between archaebacteria and gram positives on the one hand
and gram negative eubacteria and eukaryotic species on the other hand is
still clearly observed.  

Although Dr. Moran and Dr. Gogarten contend that it is a deletion in
the gram positive and archaebacteria rather than a insertion in the
gram negative and eukaryotes, neither of them have presented any
argument or evidence to support their claim.  The presence of a gap in
MreB, the requirement of multiple, identical inserts otherwise, the
HSP70 phylogeny and the likelihood of each group in the hypothesized
early environment on earth, all point to the indel as an insertion in
eukaryotes/gram negatives.  We have published detailed arguments
supporting this claim (see references listed below).

Dr. LaBonne has written ...

    And why should we choose to believe _that_, when there are many
    sequences that _do_ support the Iwabe et al. rooting?
    [demonstrating a relationship between the eukaryotes and the
    archaebacteria]  Allow me to refer you also to the Doolittle and
    Brown symposium paper (PNAS 91:  6721-28, 1994).

(the phrase in square brackets is our addition).

Well ..., we don't want to contradict clear data that have been well
established.  So any hypothesis that explains the HSP70 data must not
contradict this data.  How is this possible?  

As we tried to explain in our paper (obviously unsuccessfully), there
is a hypothesis which permits both.  This hypothesis, which was
originally proposed by Professor W.Zillig (Current. Opin. Gent. Devel.
1:544-551, 1991) suggested that the eukaryotic cell nucleus is derived
by superimposition or a combination of an archaebacterial and a
eubacterial genome.  By selection of some genes from the archaebacteria
to continue in the eukaryotic lineage - you would arrive at one tree;
by selection of some genes from the eubacteria to continue in the
eukaryotic lineage - you would arrive at another tree.  The HSP70 data
suggests that the eubacteria was a gram negative bacteria and Dr. Jim
Lake's results suggests the archaebacteria was perhaps an eoocyte.

Dr. LaBonne continues ...

    None of the possible universal trees can be refuted by a single
    gene phylogeny; if such a procedure were accepted we would have to
    conclude that they have _all_ been falsified.  What then- special
    creation? ;-)

We do not advocate special creation.  Dr. Gogarten kindly provided a
list of other genes which also show this relationship.  His list from a
previous post can be summarized with shortened references as ...

     glutaminsynthetases - Kumada _et al_. 1993; Tiboni _et al_. 1993;
                              Brown _et al_. 1994.
     carbamylphosphate synthetase - Lazcano _et al_. unpublished.

     glutamate dehydrogenase - Benachenhou-Lafha et al., 1993;
                               Hilario and Gogarten, 1993.

     F-ATPase subunit encoding DNA isolated from Methanosarcina -
                    Sumi et al., 1992; Hilario and Gogarten, 1993.

(This is Dr. Gogarten's list, we have not confirmed all of these
(yet)).  We have also found more proteins which give this unusual
phylogeny and will present them in a forthcoming issue of MBE.

Finally in response to Dr. Lepp's request we append various references
to our work related to HSP70 phylogeny and it's implications

     R.S.Gupta and B.Singh, 1992, J.Bact. 174:4594-4605.

     R.S.Gupta and G.B.Golding, 1993, J.Mol.Evol. 37:573-582.

     R.S.Gupta, G.B.Golding, B.Singh, 1994, J.Mol.Evol. 39:537-540.

     R.S.Gupta _et al_, 1994, Proc.Natl.Acad.Sci. 91:2895-2899.

     J.Lake 1994, Proc.Natl.Acad.Sci. 91:2880-2881. [commentary on
     the above paper].

     R.S.Gupta and B.Singh, 1994, Current Biology 4:1104-1114.

     D.M.Irwin 1994, Current Biology 4:1115-1117  [commentary on the
     above paper].

     M.Falah and R.S.Gupta,  J.Bact. 176: No. 24, 1994, in press.

     R.S.Gupta, 1995, Mol.Microbiology 15:1-11.

Finally we urge the readers of this news group to examine these papers
(for example they include additional data on two more archaebacterial
sequences and more data on primitive eukaryotic species).  We would
welcome any well thought out alternative hypotheses which can explain
all of this data.

Brian Golding			Radhey S. Gupta
Dept.of Biology			Dept of Biochemistry
McMaster University		McMaster University
Hamilton, Canada		Hamilton, Canada

More information about the Mol-evol mailing list