IUBio

EMBL <> GenBank

Reinhard Doelz doelz at comp.bioz.unibas.ch
Wed Dec 23 03:16:43 EST 1992


In article <1992Dec22.220954.563 at nlm.nih.gov>, ostell at object.nlm.nih.gov (Jim Ostell) writes:
|> Actually there are a large number of differences between EMBL and
|> GenBank, including entries with the same primary accession but
|> different sequences in EMBL and GenBank, entries with the same
|> sequence but different primary accessions in EMBL and GenBank,
|> and all sorts of other varients.  There are a variety of reasons
...

It has been brought to our attention by one of our customers that there 
are currently thousands (!) of these cases. In contrast to the original 
assumption that Genbank 74 will now be sort of identical to EMBL 33, I 
can only warn all of you who trusted in this rather than trying it out. 

Actually, if you take one of the applications doing accession number 
exclusion (I use the GCG package for this purpose here), I get more 
than 10000 entries which show accession numbers that are not in EMBL 33
but in GENBANK 74 ... the listing keyed by division is shown here: 

Gb_Ba:
     Entries:   1,715   Accession Numbers:  2,692
Gb_In:
     Entries:   1,037   Accession Numbers:  1,494
Gb_Om:
     Entries:     446   Accession Numbers:    634
Gb_EST:
     Entries:   1,000   Accession Numbers:  1,000
Gb_Ov:
     Entries:     487   Accession Numbers:    694
Gb_Ph:
     Entries:      76   Accession Numbers:    134
Gb_Pl:
     Entries:   1,707   Accession Numbers:  2,512
Gb_Pr:
     Entries:   3,096   Accession Numbers:  3,743
Gb_Ro:
     Entries:   1,933   Accession Numbers:  2,601
Gb_St:
     Entries:      61   Accession Numbers:     63
Gb_Sy:
     Entries:       5   Accession Numbers:      5
Gb_Un:
     Entries:       8   Accession Numbers:      8
Gb_Vi:
     Entries:     747   Accession Numbers:    938

The statistics read as follows: 

Numbers       	total   Primary numbers
D...		 1093 	1027
J... 		 15 	3
K... 		 4 	2
L... 		 219 	219
M... 		 966 	107
S... 		 2900   2899
V... 		 3      0
X... 		 468    6
Z... 		 30     6

Total            5698   4269

The only thing which is not clear to me is, then, why these 5700 numbers
are present in more than 15000 Genbank entries. Clearly a point where 
a cleanup might be necessary... 

Regards
Reinhard

-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------



More information about the Embl-db mailing list

Send comments to us at biosci-help [At] net.bio.net