Evolutionary tree of *all* proteins

Dale R. Worley drw at kutta.mit.edu
Fri Oct 14 10:13:49 EST 1994


In article <1994Oct13.114907.1 at wsuhub.uc.twsu.edu> mcdonald at wsuhub.uc.twsu.edu writes:
   One danger in trying to build a comprehensive line of
   descent for all proteins using sequence similarity alone
   occurs when you reach a certain level of dis-similarity.

Yes, that is a very real risk.  What I would like to see is where one
has the sequences to the protein in question (actually, the gene
making the protein in question), for enough different organisms that
when you plot all the Nearest Common Ancestor points on the Tree of
Life, that you never get a long stretch (say, >30 million years)
without an NCA point.

What this gains you is that you never have to say something like,
"Well, the mamalian sequence looks a bit like the reptilian sequence,
but not enough that we can definitively say they're related."  That is
because you have a solidly reconstructed sequence for the
proto-mamalian version, literally a time-machine back 60 million
years.  You then compare that sequence with the the proto-sequence of
the clade that mammals split from, whose descendents are all reptiles.
The two should differ by a fairly small amount, if they're really
homologous.

If the sequences evolved convergently, the reconstructed
proto-sequences going back in time will diverge from each other,
presumably eventually pointing to the proto-sequences from which each
convergent sequences was derived.

   The problem is that, by single-mindedly equating
   the degree of sequence difference with evolutionary
   divergence, you become blinded to the possibility of
   evolution by sequence convergence.  It is quite possible
   that, for some important cellular functions, there are
   a few protein motifs that outperform all others.

That's why you really want to use the gene sequence, rather than the
protein sequence.  There's a lot of information in the DNA that simply
doesn't affect the generated protein sequence (in many codons, the
third position doesn't matter), and so should not be subject to
convergent evolution.  To the degree that this idea has been tested,
the evidence is consistent with standard theory:  nucleotide positions
that don't affect the protein sequence show faster change (genetic
drift) than positions that do affect the sequence.

Dale

Dale Worley		Dept. of Math., MIT		drw at math.mit.edu
--
In some respects, the [modern educational] reforms worked perfectly.
The self-esteem building, for example, was a spectacular success.  In
a recent comparison of math skills, eighth grade students in America
scored well below like-aged students in such countries as South Korea,
with more than four times as many Korean students able to do "complex
problem solving."  But in a survey that accompanied the test, about
two-thirds of the American students reported that they were "good at
math," while less than 25 percent of Korean students made that claim.
Never before have so many felt so good about so little.
-- Robert Lukefahr



More information about the Mol-evol mailing list