Homology/similarity/identity: proper usage

Doug_Eernisse at UB.CC.UMICH.EDU Doug_Eernisse at UB.CC.UMICH.EDU
Fri Feb 1 16:40:26 EST 1991

This is another response to the recent query by David Steffen regarding the use of
the term "homology." I thought I might as well throw my two cents worth in.
 Many molecular biologists commonly use informal arbitrary criteria as 
 support for statements of the "homology" of two genes. For example, they 
 might suggest that if two peptide sequences in the same organism were 
 highly similar (e.g., 85 percent) then one could be confident that the
 proteins were "homologous", due to a gene duplication event, as opposed to
 similarity due to parallel evolution for similar function.
 It seems to me that hypotheses of homology are only relevant to phylogenetic
 inference at the level they are proposed to be synapomorphic (shared derived
 similarities) on a cladogram. Therefore, it is hopeless to try to provide 
 evidence for homology by the comparison of two taxa or sequences. The only 
 interesting evidence one can bring to bear on the issue of common ancestry 
 is shared "special" similarity relative to one or more outgroup taxa or 
 sequences. This issue is, I think, distinct from the issue of whether
 homology is used as many of us use the term synapomorphy, as a proposal 
 of homology, or as the actual similarity due to common ancestry which is 
 ultimately impossible to prove.
 With sequence data, there are also problems of specifying the level of
 homology. For example, Michael Ghiselin (Syst. Zool. 18: 148-149 (1969) 
 uses the following hypothetical example:
 A  Asp-Val-Glu-Met-Ala
 B  Asp-Pro-Glu-Met-Ala
 C  Asp-Pro-Thr-Met-Ala
 D  Gly-Pro-Thr-Met-Ala
 E  Gly-Pro-Thr-Tyr-Ala
 F  Gly-Pro-Thr-Tyr-Ser
 Ghiselin argues that similarity is a relation between the peptides as
 wholes, which decreases from A to F, while homology is a relation between 
 the parts. He argues, for example, that Asp is hypothesized to be homologous
 to Gly in A and F, respectively, given this alignment of the sequences.
 He also argues that the peptide sequence A could be homologous to F even
 though they are completely dissimilar. One can speak of the correspondence
 between nucleotides or amino acids in terms of their position in a sequence
 which is hypothesized to be homologous. Although Ghiselin doesn't consider
 this use of homology, one more normally may also speak of the shared
 similarity of D, E and F at site 1, relative to A, B and C, which could
 be a synapomorphy (hypothesis of homology), depending on the outgroup(s)
 one selects which in turn determines the cladogram topology. One can also 
 hypothesize that peptide F is homologous to peptide A, or more precisely, 
 hypothesize that the shared ancestor of A and F had single protein-coding 
 gene which is traceable, by descent, to the genes in A and F which produced 
 these peptides.
 Confusing, isn't it?
 Doug Eernisse
 usergdef at ub.cc.umich.edu
 usergdef at umichub.bitnet
 Museum of Zoology and Dept. of Biology
 University of Michigan

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net