EMBL <> GenBank

Reinhard Doelz rdoelz at comp.bioz.unibas.ch[remote]
Wed Dec 23 14:31:47 EST 1992


In article <1992Dec23.160654.60156 at embl-heidelberg.de>, stoehr at embl-heidelberg.de writes:
|> In article <1992Dec23.081643.11004 at comp.bioz.unibas.ch>,
|> doelz at comp.bioz.unibas.ch (Reinhard Doelz) writes:
|> > 
|> > It has been brought to our attention by one of our customers that there 
|> > are currently thousands (!) of these cases. In contrast to the original 
|> > assumption that Genbank 74 will now be sort of identical to EMBL 33, I 
|> > can only warn all of you who trusted in this rather than trying it out. 
|> > 
|> 
|> Firstly, the assumption that GenBank 74 should be 'sort of identical' to
|> EMBL 33 is false (depending on how identical 'sort of' means). The two
|> databases are made at different times.

Sorry if this sounds misleading  - I trusted in the release notes stating 
"New and updated sequence data from the latest GenBank and  DDBJ  releases
have been  incorporated  into  this release." Freeze date EMBL 
Release 33 on 8 November 1992, GENBANK 74 on 17 November 1992 - 
I thought that this would imply the most recent daily updates from 
GENBANK also to be in EMBL and vice versa. 

|> We do not yet have Genbank 74 here: when we do, as for other quarterly
|> releases, we determine what we are missing and work to include it all in the
|> EMBL database. I'm surprised at the figure of 10,000, as I believe that when
|> we made EMBL 33 that all acc#'s from the previous GenBank release were in.
|> Another current difference is the accession numbers beginning with 'S' (about
|> 3000 according to your figures) which we do not include in EMBL yet - but
|> that's a different story.

I used the -exclude flag in the GCG software to compute the numbers that I 
quoted. The entry number is surprising to me also, and I just hope that 
there is an error in my procedures. 

With respect to the numbers; I have taken the daily EMBL updates (as of Dec 20
at EMBL creation), and the GENBANK 74 release files (not their daily updates), 
and came up with 8141 entries of EMBL, resembling 8540 accession numbers. 
(This is for both new entries from EMBL and GENBANK as sent by EMBL in 
EMBL format. If I look at how many entries are in GENBANK 74 not in 
EMBL 33 (containing 5698 AN), and subtract those which are unique to EMBL, it 
appears that EMBL updates contain 2591 entries which are in GENBANK 74 
(according to 2612 Accession Numbers). That still leaves us with a 
considerable number of discrepancies. 
|> 
|> Regards,
|> Peter Stoehr
|> EMBL Data Library

Your work is most appreciated, and I encourage you to keep up the good 
work. I didn't want to make any negative statements on quality in general, 
just point out that the exclusion sets are still a need. 

Regards 
Reinhard 

-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------



More information about the Embl-db mailing list