In article <1992Dec22.220954.563 at nlm.nih.gov>, ostell at object.nlm.nih.gov (Jim Ostell) writes:
|> Actually there are a large number of differences between EMBL and
|> GenBank, including entries with the same primary accession but
|> different sequences in EMBL and GenBank, entries with the same
|> sequence but different primary accessions in EMBL and GenBank,
|> and all sorts of other varients. There are a variety of reasons
...
It has been brought to our attention by one of our customers that there
are currently thousands (!) of these cases. In contrast to the original
assumption that Genbank 74 will now be sort of identical to EMBL 33, I
can only warn all of you who trusted in this rather than trying it out.
Actually, if you take one of the applications doing accession number
exclusion (I use the GCG package for this purpose here), I get more
than 10000 entries which show accession numbers that are not in EMBL 33
but in GENBANK 74 ... the listing keyed by division is shown here:
Gb_Ba:
Entries: 1,715 Accession Numbers: 2,692
Gb_In:
Entries: 1,037 Accession Numbers: 1,494
Gb_Om:
Entries: 446 Accession Numbers: 634
Gb_EST:
Entries: 1,000 Accession Numbers: 1,000
Gb_Ov:
Entries: 487 Accession Numbers: 694
Gb_Ph:
Entries: 76 Accession Numbers: 134
Gb_Pl:
Entries: 1,707 Accession Numbers: 2,512
Gb_Pr:
Entries: 3,096 Accession Numbers: 3,743
Gb_Ro:
Entries: 1,933 Accession Numbers: 2,601
Gb_St:
Entries: 61 Accession Numbers: 63
Gb_Sy:
Entries: 5 Accession Numbers: 5
Gb_Un:
Entries: 8 Accession Numbers: 8
Gb_Vi:
Entries: 747 Accession Numbers: 938
The statistics read as follows:
Numbers total Primary numbers
D... 1093 1027
J... 15 3
K... 4 2
L... 219 219
M... 966 107
S... 2900 2899
V... 3 0
X... 468 6
Z... 30 6
Total 5698 4269
The only thing which is not clear to me is, then, why these 5700 numbers
are present in more than 15000 Genbank entries. Clearly a point where
a cleanup might be necessary...
Regards
Reinhard
--
+----------------------------------+-------------------------------------+
| Dr. Reinhard Doelz | RFC doelz at urz.unibas.ch |
| Biocomputing | DECNET 20579::48130::doelz |
|Biozentrum der Universitaet | X25 022846211142036::doelz |
| Klingelbergstrasse 70 | FAX x41 61 261- 6760 or 267- 2078
| CH 4056 Basel | TEL x41 61 267- 2076 or 2247 |
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
-----------------------------------------