There are three further points that I think are relevant to this
discussion (which used to seem like a settled issue).
(1) Since so many interesting proteins are chimeras, we should
really refer ONLY to the domains (or predicted domains) as being
members of families, superfamilies, etc., rather than the proteins
that comprise them; "family" membership does connote common descent,
and in many cases only a portion of a multi-domain protein can
properly be referred to in this sense.
(2) Since the primary distinction between a protein family and a
superfamily is the degree to which a functional activity (or set of
activities) is conserved, it is inappropriate to draw a firm
conclusion based on sequence comparisons IN THE ABSENCE OF PROTEIN
FUNCTIONAL INFORMATION. This is not necessarily as important a
distinction for GENE families/superfamilies though, since their
functions can be more completely inferred from their sequences.
(3) It's never been clear to me just how we should relate the
concepts of protein family/superfamily, etc., to those of gene
family/superfamily, etc. And what about pseudogenes, synthetic
genes, and knockouts? And how should we NAME families? The current
practice is to use the name of the first family member discovered,
usually a poor choice.
Bob Obar
102063,2640 at compuserve.com102063,2640 at compuserve.com