using gene names for genbank entries

Tom Schneider toms at fcs260c2.ncifcrf.gov
Thu Oct 31 17:14:47 EST 1991


In article <9110291754.AA01092 at genbank.bio.net> jecop at GENGENP.RUG.AC.BE
 (Jeroen Coppieters) writes:

>It is proposed here to use a naming system comparable to the E coli gene names
>for objects in sequence databases. This seems a very simplified idea to me.
>The naming system should not only include genes. Many sequence objects in the DB
>contain no gene at all, but still need a name. What is the genetic name for a
>198 bp repeat from Arabidopsis thaliana for which no genetic function is known
>yet?

If you found it, give it a name!  I suggest "Fred".  :-)

>and there is a second one of 179 bp with some common characteristics
>We could call them A(rabidopsis)THA(liana)R(epeat)11 an ATHAR12. You need
>a system of naming conventions for ALL independent objects, from all
>kinds of organisms.

Unfortunately people have already named things, and most people would be quite
confused if the names were changed.  Perhaps you can propose a consistent
system though, like what the chemists have done!

>And it will be a huge task to create an accepted
>standard.

Yes.  The chemists seem to be on the way though!

>It seems to me that scientists working with one organism still
>have problems to get to a standard amongst themselves.

Yes.  However, if one uses the species name as part of the label, then all
names withing the scope of the species can be consistent.  In this way, we
introduce the concept of variable scope from computer languages into our
problem.  This is the way people do it anyway, to it is not really a big
change.

>And to name objects, let's first define what an independent object is.

Well, we can just start making a list.  Ribosome binding sites are always
the first on my list,  but really promoters should be, then there's...

There are problems with the way GenBank assigns these things.  For example,
they assign specific ranges to binding sites, whereas this may not be supported
by the data.  That is, conservation (as measured by the information content)
can be outside the defined range.  I think it is better to give a zero base,
but one must also give the orientation on the sequence.  At least the
present method works, though it is not great.

>Secondly: I work in a lab generating lots of sequence (900000 bp this year,
>resulting in about 70 kb of corrected sequences)
>from all kinds of plant an bacterial organisms.
>We sometimes run into genes for which we do not have te slightest
>idea about their function. How do we give them a genetic name?

Some groups name open reading frames orf1, orf2... But I think the folks
working on drosophila have a neat idea, and the names there are pretty nifty
(futz, bicaudal, etc).  How about naming the genes after the streets in your
home town?  The point is that although name tags are easy to remember, but are
really just arbitrary strings.  Z?  why lacZ?  I don't know!  But you and I
both know how different that is from lacI or lacY!

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms at ncifcrf.gov



More information about the Bioforum mailing list