On the usefulness of sequences in the databases

Reinhard Doelz doelz at comp.bioz.unibas.ch
Thu Jun 24 12:06:58 EST 1993


Colleagues, 
I have full sympathy in being eager to collect all and everything in 
the sequence data arena. Therefore, I currently compare GENBANK 77 vs. 
EMBL 35, and try to figure out the differences. Most can be explained,  
however, look at the following sequence: 

LOCUS       A00674          6 bp    DNA             PAT       29-JAN-1993
DEFINITION  Nucleotide sequence 3 from patent number WO8601533
ACCESSION   A00674
KEYWORDS    .
SOURCE      Unknown
  ORGANISM  Unknown
            Unclassified.
REFERENCE   1  (bases 1 to 6)
  AUTHORS   
  TITLE     'PRODUCTION OF CHIMERIC ANTIBODIES'
  JOURNAL   Patent: WO 8601533-A 3 13-MAR-1986;
  STANDARD  full automatic
BASE COUNT        3 a      2 c      0 g      1 t
ORIGIN      
        1 cactaa


I start to worry what the purpose of these sequences is. I am aware of the
fact that this is an extreme, but there are a couple of more, very short 
sequences. In particular, these 'patent' data are of more than doubtful 
quality. However, as both EMBL and GENBANK incorporate these, I need to 
explain to my customers what the need is to have a hexanucleotide 
in a sequence database, which occurs 28340 times in over 70000 sequences?


Regards
Reinhard 
  


-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------



More information about the Embl-db mailing list