IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP


BERLYNM at yalemed.bitnet BERLYNM at yalemed.bitnet
Sat Jul 13 14:52:00 EST 1991

        Some truisms about names:  Names can have a number of roles,
for example a mnemonic one and a classification one.   Names
perform totally unsatisfactorily in the role of unique identifier or
conveyer of all important information about the object.  Thus hisA
and hisB, etc., namings are meant to be mnemonic for 'histidine' and
to convey that a relationship exists between them that is different
than the relationship between hisA and trpA, in this case, being
members of the histidine biosynthetic pathway.  The role that names
are notoriously poor at playing is the unique identification role.  Not
for nothing do we have social security numbers; not for nothing do
good databases, or other registries, have 'unique identifiers' or
'surrogate keys' that are unique and unchanging and can be
simultaneously associated with old names, current canonical name,
any synonym from any source (and any amount of other information
you want to associate with it) .  You can change the name as many
times as you like and as often as you want to visit confusion on the
literature, but that identifier persists for that entity.  For example,
the trpB gene in the E.coli Stock Center database carries the unique
identification number 73, assigned to it by the computer at time of
automated entry as the next serial number available. (Whereas, the
gene apbS, which was entered much later, is uniquely identified as
Gene# 18562.)  If a bacterial geneticists' or omnigeneticists' naming
commission were to convene and decide it was imperative to name
by enzyme name rather than pathway, and we decided to put our db in
compliance with what would seem to me a rather foolish
recommendation,  we could change the name to tsbA or whatever,
and #73 would take on a new canonical name and the old one would
become a synonym, and appropriate annotations and references
concerning the change could also be linked to that unique number.
All computer operations use the number and would be unaffected by
the name change, since the name, as far as the computer is
concerned, is a pretty trivial attribute which it routinely presents
to the user only to serve our feeble human mnemonic requirements.
All strains carrying a mutation in that gene would automatically
call Gene#73's new name in its genotypic description and all
information about Gene#73's Gene Product, Map Position, etc., would
remain unchanged and in no need of change or reconciliation.
        By trying to make a name exquisitely descriptive, you may
make it precious, and lose both the mnemonic and classification
function. Take your example, in which you would apparently like to
convey all the following information by name only.  That it is the
gene for a specific tryptophan synthase subunit and that it is a
member of a duplicated gene set, and in doing so, you rightly regret
having to give up the classification-information that it is a gene for
one of the enzymes in the tryptophan biosynthetic pathway.    All of
this information could be carried in the name only if it were a very
cumbersome and unconventional gene name.  Conventions are
established to give guidelines on what level of mnemonic or
classification a namer should consider most strongly when assigning
the name.  Pathway has been an established, useful level in many
naming conventions.  It just happens that your specific question
deals with a well-known enzyme, which also, by virtue of being at
the end of a pathway, bears a name not unlike the pathway.  While ts
for tryp synthase may be fairly mnemonic, would you be equally
inclined toward 'as-something' for anthranilate synthase gene or
some selection of letters from PR-AIC isomerase for hisA or who
knows what for glutamine amidotransferase-phosphoribosyl
anthranilate transferase for the bi-funcional trpD gene of coli.
Naming-by-enzyme is suitable in some cases (adh, for example), but
a blanket commitment to it is a can of worms.
        I agree with a previous commentator on advantages of some of
the bacterial conventions. Of course, some specific ones have not
worked well, and since I've had experience with these at the E. coli
Stock Center, I'll comment on those when I feel like another
        As I'm sure you know, there is much discussion of gene naming
conventions for plants, bacteria, et al.  and there will be meetings in
Tucson and elsewhere.  An equally important consideration is that of
unambiguous ALLELE naming and registry.  Changing gene names
obviously affects allele names and within-gene numbering systems
may collide if, for example, GeneA turns out to be pre-existing
GeneB and is renamed as such.  You must carefully look for
ramifications of any name change.  Question of official registries in
general is an important one, also for later discussion.  My specific
advice for now would be, since new conventions are under
consideration at the moment, the current guiding principle should be
to minimize the confusion in the literature due to proliferation of
name changes that may well be destined to be re-changed in a few
months.  Although computers can handle and track synonyms easily
IF they were entered properly, humans are easily confused.
        The more interesting and unexplored aspect of the questions
you raise has to do with the relationship between sequenced region/
sequenced gene name and genetically characterized gene name.  I
have not seen a specific proposal in this regard.  Currently a
sequence does of course carry a unique identifier in the database in
which it resides and this is often associated with a gene name
supplied by the authors, using whatever convention they choose.  In
the case of coli, those choices are often unguided by convention or
by resort to the in-place registry system.  If in sequencing Gene xxx,
an immediately adjacent open reading frame is found, some ad hoc
namers have, without consultation,  changed the name of xxx to xxxA
and given the orf the name xxxB, with no functional relationship
having been established, or to a close reader, implied. (A casual
reader may make a different interpretation.)  The relationship of
xxxB to a gene mmmA which has been defined and mapped very near
xxx strictly by genetic means is also not determined, and the
informed reader would understand that.  If further study shows that
xxxB is in fact mmmA, the appropriate name change can be made and
the synonymies defined.  However, that further study may be a long
time in coming, and separate gene-name and sequence-name may
long co-exist and mis-represent the structure of that region of the
chromosome.  Understanding the source of confusion will often allow
preventing that confusion.  If you establish a convention, either a
Naming or Type-characterization convention, distinguishing between
Sequenced-transcription-units and Functionally (phenotypically)-
defined-genes, then a map can use either the NameConvention or a
SymbolManipulation that conveys to the reader whether the
mapobject site is characterized by one or both of those conditions.
Again, you would have to use commonsense to ensure that you do not
try to convey so much information that the name becomes
cumbersome and absurd (for example, arguments that being
expressed or not expressed, duplicate or unique, etc., should also be
reflected by the map symbol should probably be quashed.)
Alternatively, you could argue that a map is a caveat emptor artifact
and any user not willing or aware enough to go back to the database
and find out just how mmmA and xxxB were defined deserves to be
confused.  For now, in the absence of convention and authoritative
database and registry, I think your procedure is right to give them
different names until the correlation is unequivocally established;
when it is cinched, at the moment in the absence of convention, it's
your call whether to rename one or both (I think precedent would
come down on the side of favoring the phenotypically, mutationally
defined name) , BUT the important thing is to make sure that in this
interim period (and forever after) the link between new names and
former published names is not lost.
        (It's not that much different from naming a kid. You  usually
have some sentimental or logical reason, and the culture may have
some conventions such as all sibs have same middle name or have
father's last name, and the name persists in the majority of (male at
least) cases, but conventions are sometimes rejected, renaming
options are sometimes used, and identifying numbers serve to
counteract the ambiguity.  Information may or may not be coded in
the names (rough possible-relationship indication) or the identifying
numbers (geographic coding for SS#).  The desirability of
convenience and mnemonic aid for name and of persistence and non-
ambiguity for identification symbol far outweighs the meager
ability to convey information by short notation, and the true
complexity of information should be attached to the id, not be part
of it.
Mary Berlyn

More information about the Arab-gen mailing list

Send comments to us at biosci-help [At] net.bio.net