"source" feature key

frist at ccu.umanitoba.ca frist at ccu.umanitoba.ca
Wed Jul 7 12:52:22 EST 1993


The current Feature Table Definition (Version 1.04) describes a new feature
key:

- - - - - - - - - - - - - - (from FT Definition) - - - - - - - - - - - - 
    Feature key        source

    Definition: identifies the biological source of the specified span of the
    sequence. This key is mandatory. Every entry will have, as a minimum,
    a single source key spanning the entire sequence. More than one source
    key per sequence is permittable.

    Mandatory qualifiers:  /organism="text"

    Optional qualifiers:   a whole bunch, including /label=feature_label
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This is a very welcome addition to the Feature Table (FT) language. However,
I have some questions.

1) Source is does not appear to be implemented as of GenBank Release 77.0.
Is it likely to be implemented soon? Has it only been implemented in new
entries, so that I just haven't seen it yet?

2) Can we have a few working examples of source in use? There are a number
of usage conventions that need to be agreed upon. Let me propose that 
it might be very convenient if the 'source' label field was just the 
ORIGINAL accession number for that span:

ORIGINAL entry:
     source                1..1089
                           /label=X12345
                           /organism="Unicornus mythicalis"
     exon                  103..712
                           /label=X12345:exon1
                           /note="This feature added for illustrative
                           purposes" 

SAME FEATURES AFTER MERGER INTO LARGER ENTRY:
     source                134231..135319
                           /label=X12345
                           /organism="Unicornus mythicalis"
     exon                  134333..134942
                           /label=X12345:exon1
                           /note="This feature added for illustrative
                           purposes" 

Note that I am proposing the convention that all features other than
source have labels that are FT expressions, incorporating the original
accession number. This convention ensures that each label is UNIQUE
ACROSS THE ENTIRE DATABASE, and NEVER CHANGES. To illustrate a point,
one might merge a bunch of sequences representing a chromosomal region
into a larger entry. Prior to merger, many of the component entries might
have features labled "exon1", and so their labels would be ambigous. 
Incorporating the accession # into the label prevents this probelem.
Another ramification of the conventions I have shown is that even base
ranges remain valid. For example:

X12345:103..712 

would still return the same exon from the original entry
prior to merger, because the software can reconstruct X12345 from the
new entry.

It is my feeling that we need to build mechanisms like this into
the databases that will protect FT expressions from obsolescence.
Using conventions such as this, one can maintain lists of expressions
that will always retrieve the same sequence, regardless of how much
merging and correction goes on among entries. 

3) The wording of the definition implies that every entry must now have
a Feature Table, even if only to have a source field. Is that true?

I guess I've done more than simply ask questions here, but I think
these points were worth bringing up. 

===============================================================================
Brian Fristensky                | 
Department of Plant Science     |  A question is like a knife that slices
University of Manitoba          |  through the stage backdrop and gives us
Winnipeg, MB R3T 2N2  CANADA    |  a look at what lies hidden behind.
frist at ccu.umanitoba.ca          |  
Office phone:   204-474-6085    |  Milan Kundera, THE UNBEARABLE LIGHTNESS 
FAX:            204-261-5732    |  OF BEING
===============================================================================



More information about the Embl-db mailing list