More than you probably wanted to know about homologs, paralogs, and orthologs

Erich Schwarz schwarz at cubsps.bio.columbia.edu
Sat Jun 22 16:31:09 EST 1996


Oliver Hobert wrote:

> Hey folks - Can anybody supply me with an explanation what "homologs",
> "paralogs" and "orthologs" mean in terms of gene sequence similarity, i.e.
> when are two similar genes called homologs, paralogs or orthologs.


    There is *no* connection between identity and these terms.
These terms merely denote the nature of the evolutionary relationship.
Furthermore, they can be debated -- at least one serious _C. elegans_
person I know (Ralf Sommer) has argued that we should term *everything*
a paralog (I don't agree, but he has a worthwhile argument for his
position...)


     The way I would go about defining these terms in practical use:

     Homolog: a gene or protein that one asserts, on the basis of
statistically significant similarity (which can be as low as 25%
amino acid identity out of 100 or more residues) to be related to
a gene or protein of interest.

     Ortholog: a homolog that one asserts, on the basis of rationally
computed gene or protein evolutionary trees, to be related to a 
gene/protein of interest by *species divergence*: their last common
ancestor should have split into two lineages by speciation, not
gene duplication (followed typically by functional diversification).

     Paralog: a homolog asserted to be related to one's favorite
gene/protein by *gene duplication*: their last common ancestor
should have undergone gene duplication before speciation split
the lineages of the two gene/proteins.


     To put flesh on these bony abstractions:

          horse alpha-globin is the ortholog of human alpha-globin

          human beta-globin is the paralog of *both* human alpha-globin
      *and* horse alpha-globin

          all three are homologs of one another

          all three are both homologs and paralogs of, say, human myoglobin

          human myoglobin is an ortholog of horse myoglobin ... [etc.]

          without really serious computation of evolutionary trees,
       one can't reliably assert whether all of the above are orthologs
       or paralogs of bacterial globin (but it's very likely that they're
       orthologs, if only because the eubacterial-eukaryotic split is
       so ridiculously long ago)


       Note: I have said *nothing* about % amino acid identity, because
the rates at which %a.a.I. lower with increasing evolutionary divergence
are highly variable between different types of proteins.  Histone H4
scarcely varies between cows and peas; opsins are 75% divergent at the
protein level merely between cows and flies.   Obviously one scale
of amino acid identity will tell us nothing about the general determination
of homology, orthology and paralogy.


       As I understand Dr. Sommer's position, we should default to
assuming paralogy rather than orthology because we can never be sure
if a gene duplication didn't precede a species split.  I myself think that
in practical situations one can often feel sure, or make good phylogenetic
arguments, that this is not the case.


--Erich Schwarz



More information about the Celegans mailing list