When amino acid sequences can be aligned and the similarites are greater
than about 25% identity it is reasonable to infer homology. Reliable
phylogenetic trees can be built from such comparisons but it is preferable
to have a larger degree of similarity for at least some of the comparisons
in a multiple sequence alignment.
Problems arise when the similarities are less than 20-25% identity. There
are obviously two possibilities; the proteins are homologous or they are
not homologous. How to decide?
It is often assumed that even in the absence of detectable sequence similarity
two proteins are homologous if their structures are similar. The hypothesis
is that the genes encoding the two proteins arose from a duplication event
but that subsequent evolution has resulted in two genes that show no trace
of similarity at the sequence level in spite of the fact that the structures
of the proteins have been conserved. This point of view is often expressed
strongly in the literature but more often it is assumed to be intuitively
obvious or axiomatic.
Joe Felsenstein writes,
"A number of people commented, to the effect that with poor
alignment, it is very hard to make a phylogeny.
For now, that's correct, but it has often been noted that
structures are conserved even more than the amino acids from
which they are made. It will be possible in the future, with
more algorithm development (which partly waits for more structure
data to play with) to infer phylogeny from structure itself,
amino acid sequence playing a secondary role. There is at least
one pioneering effort in this direction, a paper by Johnson,
Sali and Blundell in Methods in Enzymology volume 188, the volume
on molecular evolution. They measured dissimilarity of structures
and based a distance method on the resulting measures, getting a
reasonable tree.
There are other ways too, and we can expect a lot more activity
in this area. So perhaps we outght not be totally negative about
the possibilities here."
I would like to add a note of caution and skepticism. It is possible that
the number of core structures of proteins is more limited than we realize
and that structural similarity is due to convergence and not homology.
For example, actins, HSP70's, and hexokinases have similar structures but
it is difficult to imagine how they could have evolved from a common
ancestor and end up with unrelated sequences. The beta-barrel proteins are
another example of structurally related proteins that cannot be shown to
be homologous. If one begins with the assumption that structural similarity
is a clear indication of evolutionary relatedness, even in the absence of
sequence similarity, then I have no doubt that a measure of structural
similarity could be calculated. The key question is whether this measure
has anything to do with evolution (it does if one assume that to begin
with!). One could construct a dendrogram showing structural relatedness
but this may not represent evolution.
It may turn out to be true that all proteins with similar structures are
descended from a common ancestor but we should not fall into the trap of
assuming that "a priori". The alternative hypothesis, convergence, should
not be dismissed so casually.
Laurence A. Moran (Larry)