On the usefulness of sequences in the databases
Reinhard Doelz
doelz at comp.bioz.unibas.ch
Thu Jun 24 12:06:58 EST 1993
Colleagues,
I have full sympathy in being eager to collect all and everything in
the sequence data arena. Therefore, I currently compare GENBANK 77 vs.
EMBL 35, and try to figure out the differences. Most can be explained,
however, look at the following sequence:
LOCUS A00674 6 bp DNA PAT 29-JAN-1993
DEFINITION Nucleotide sequence 3 from patent number WO8601533
ACCESSION A00674
KEYWORDS .
SOURCE Unknown
ORGANISM Unknown
Unclassified.
REFERENCE 1 (bases 1 to 6)
AUTHORS
TITLE 'PRODUCTION OF CHIMERIC ANTIBODIES'
JOURNAL Patent: WO 8601533-A 3 13-MAR-1986;
STANDARD full automatic
BASE COUNT 3 a 2 c 0 g 1 t
ORIGIN
1 cactaa
I start to worry what the purpose of these sequences is. I am aware of the
fact that this is an extreme, but there are a couple of more, very short
sequences. In particular, these 'patent' data are of more than doubtful
quality. However, as both EMBL and GENBANK incorporate these, I need to
explain to my customers what the need is to have a hexanucleotide
in a sequence database, which occurs 28340 times in over 70000 sequences?
Regards
Reinhard
--
+----------------------------------+-------------------------------------+
| Dr. Reinhard Doelz | RFC doelz at urz.unibas.ch |
| Biocomputing | DECNET 20579::48130::doelz |
|Biozentrum der Universitaet | X25 022846211142036::doelz |
| Klingelbergstrasse 70 | FAX x41 61 261- 6760 or 267- 2078
| CH 4056 Basel | TEL x41 61 267- 2076 or 2247 |
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
-----------------------------------------
More information about the Embl-db
mailing list