sequence homology inference

bjadams at CRCVMS.UNL.EDU bjadams at CRCVMS.UNL.EDU
Fri Nov 10 09:42:33 EST 1995

Concerning  the Determination of Homologous Sequences,
David Maddison wrote:

<"This collection of sequences produce proteins that all have the
<same function, but the sequence similarities of some of them
<to the others are very low; there are conserved residues that
<are present in some, but not all of the sequences."

<"Another way to put it is that I would like to know if this
<collection of sequences is monophyletic on the grand tree of
<all gene trees, or if it is para/polyphyletic with the intervening
<sequences being of different function.  While this is
<fundamentally a phylogenetic question (presuming such a
<grand tree of all gene trees exists), it is a horrendous one
<to answer in that the sequences are so divergent that
<one can't do a normal phylogenetic analysis on them - there's
<just nothing to get a hold of."

	It sounds to me like what you are really asking is if these gene 
sequences are homologous.  Lewin (1987 Science 237:1570) discussed this 
in terms of semantic, and there have been many other authors who have 
addressed the assessment of "homology" in molecular sequence data (e.g. 
Patterson, 1988 Mol.Bio.Evol. 5(6):603-625).  However, if what you are 
trying to do is figure out a way to build a tree out of sequences that 
you do not have evidence sufficient to support inferred homology, then 
I doubt you'll uncover much.  I recall reading a paper in Cladistics 
a couple of years ago where a person randomly added nucleotide sequences 
to the data matrix of their "True Tree" until they were unable to recover 
the "true" topology, thus approximating saturation of change to their 
data set such as the one you refer to.  Although it is interesting to 
know, "How crummy your data set can theoretically be and still give you 
the right answers," I would suggest that this problem is not unique to 
molecular sequence data, and that systematists looking at any type of 
character must still justify "inferred homology."  If you are truly 
interested to know whether or not these genes are "monophyletic on the 
grand tree," I suggest that the only rational way of finding out is to 
plot them on an existing phylogeny and look to see if they show 
congruence with the patterns of speciation.  I think you could do this by 
treating some of your larger conserved regions as single characters, and 
then look to see how all of the regions fall out on the "True Tree."  
Obviously if there is no existing tree, you can't do this.  Also, once 
you have done this you could never use these gene sequences to construct 
a taxonomic phylogeny.
	I have looked into this problem from the "Chaos" and "fractal 
analysis" point of view asking the question, "even though the data are 
saturated with change, is there still an informative signal present?"  I 
have found a few papers which have addressed this topic ephemerally, and 
I could send you references if this is indeed the direction in which you 
are heading.

Byron Adams
University of Nebraska
bjadams at crcvms.unl.edu

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net