[Genbank-bb] Summary of changes from one release to another

Cavanaugh, Mark (NIH/NLM/NCBI) [E] cavanaug at ncbi.nlm.nih.gov
Tue Oct 3 10:14:53 EST 2006

We re-processed every non-EST DDBJ record in August, parsing all
of the records into ASN.1 (NCBI's internal format) with our
latest/greatest software. We try to do this once or twice a

This resulted in a "new version" of AB000100 being loaded
into our sequence database, even though the record was not
modified by the original DDBJ submitters.

Normally, we shield gbchg.txt so that it does not contain
the accession numbers of records which have changed solely
because of maintenance efforts like these.

But there was a change in our procedures in August, and that
shielding was broken.

Looking at the ownership of all the accessions in the GB 155
gbchg.txt :

 1864152 acc.ddbj
  187572 acc.embl
  940288 acc.gnbk

The 1.8 million figure is quite close to the total number
of non-EST DDBJ records in the database. This value *did*
seem too high in August... But GB 155.0 was several weeks
behind schedule, and I didn't investigate further.

Bottom line: The gbchg.txt file for GenBank 155.0 is unreliable,
containing far too many DDBJ accession numbers. The number of
DDBJ accessions it should have contained is on the order of 
145,000 .

My apologies for this error, and we thank Seth Johnson for
bringing it to our atention. As we are about to perform the
same maintenance for all non-EST EMBL records, this was a very
timely inquiry indeed!

Mark Cavanaugh

>From: Seth Johnson [mailto:johnson.biotech at gmail.com] 
>Sent: Tuesday, October 03, 2006 10:09 AM
>To: volker.weinberger at novartis.com
>Cc: genbankb at magpie.bio.indiana.edu
>Subject: Re: [Genbank-bb] Summary of changes from one release 
>to another
>Thanks alot!  That points me in the right direction.  However, 
>I got more questions about "25. gbchg.txt - Accession numbers 
>of entries updated since the previous release. ".  Looking 
>through the file I chose the very first accession number: 
>BCT1|AB000100.  GenBank shows the change date of  05-FEB-1999 
>and there's only 1 version number of that sequence.  Am I 
>mistaken in my belief that the sequence hasn't been changed 
>since 1999? 
>On 10/3/06, volker.weinberger at novartis.com 
><volker.weinberger at novartis.com > wrote:
>	Dear Seth, 
>	the following files are available on 
>ftp://ftp.ncbi.nih.gov/genbank/  (see gbrel.txt for full list 
>of files): 
>	25. gbchg.txt - Accession numbers of entries updated 
>since the previous release. 
>	29. gbdel.txt - Accession numbers of entries deleted 
>since the previous release. 
>	872. gbnew.txt - Accession numbers of entries new since 
>the previous release. 
>	Best regards, 
>	Volker 
>"Seth Johnson" <johnson.biotech at gmail.com> 
>Sent by: genbankb-bounces at oat.bio.indiana.edu 
>03.10.2006 03:46 
>genbankb at magpie.bio.indiana.edu 	
>[Genbank-bb] Summary of changes from one release to another	
>	Hello,
>	I have a question about changes from one release to 
>another.  Is there a list that summarizes which sequences have 
>been added, which have a new version, and which have been 
>discontinued? May be there's an easy way to obtain that kind 
>of information?  Suggestions appreciated. 
>	-- 
>	Best Regards,
>	Seth Johnson
>	Senior Bioinformatics Associate

More information about the Genbankb mailing list