Superfamily phylogenetics

L.A. Moran lamoran at gpu.utcc.utoronto.ca
Tue Apr 20 10:12:09 EST 1993

When amino acid sequences can be aligned and the similarites are greater
than about 25% identity it is reasonable to infer homology. Reliable
phylogenetic trees can be built from such comparisons but it is preferable
to have a larger degree of similarity for at least some of the comparisons
in a multiple sequence alignment.

Problems arise when the similarities are less than 20-25% identity. There
are obviously two possibilities; the proteins are homologous or they are
not homologous. How to decide?

It is often assumed that even in the absence of detectable sequence similarity
two proteins are homologous if their structures are similar. The hypothesis
is that the genes encoding the two proteins arose from a duplication event
but that subsequent evolution has resulted in two genes that show no trace
of similarity at the sequence level in spite of the fact that the structures
of the proteins have been conserved. This point of view is often expressed
strongly in the literature but more often it is assumed to be intuitively
obvious or axiomatic.

Joe Felsenstein writes,

     "A number of people commented, to the effect that with poor 
      alignment, it is very hard to make a phylogeny.

      For now, that's correct, but it has often been noted that 
      structures are conserved even more than the amino acids from 
      which they are made. It will be possible in the future, with 
      more algorithm development (which partly waits for more structure 
      data to play with) to infer phylogeny from structure itself, 
      amino acid sequence playing a secondary role. There is at least 
      one pioneering effort in this direction, a paper by Johnson, 
      Sali and Blundell in Methods in Enzymology volume 188, the volume 
      on molecular evolution. They measured dissimilarity of structures 
      and based a distance method on the resulting measures, getting a 
      reasonable tree.

      There are other ways too, and we can expect a lot more activity 
      in this area. So perhaps we outght not be totally negative about 
      the possibilities here."

I would like to add a note of caution and skepticism. It is possible that
the number of core structures of proteins is more limited than we realize
and that structural similarity is due to convergence and not homology.
For example, actins, HSP70's, and hexokinases have similar structures but
it is difficult to imagine how they could have evolved from a common
ancestor and end up with unrelated sequences. The beta-barrel proteins are
another example of structurally related proteins that cannot be shown to
be homologous. If one begins with the assumption that structural similarity
is a clear indication of evolutionary relatedness, even in the absence of
sequence similarity, then I have no doubt that a measure of structural
similarity could be calculated. The key question is whether this measure
has anything to do with evolution (it does if one assume that to begin 
with!). One could construct a dendrogram showing structural relatedness
but this may not represent evolution.

It may turn out to be true that all proteins with similar structures are
descended from a common ancestor but we should not fall into the trap of
assuming that "a priori". The alternative hypothesis, convergence, should
not be dismissed so casually.

Laurence A. Moran (Larry)

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net