GenBank Release 120.0 Available
cavanaug at ncbi.nlm.nih.gov
Wed Oct 18 23:52:55 EST 2000
Greetings GenBank Users,
GenBank Release 120.0 is now available via ftp from the National Center
for Biotechnology Information:
Ftp Site Directory Contents
---------------- --------- ---------------------------------------
ncbi.nlm.nih.gov genbank GenBank Release 120.0 flatfiles
ncbi-asn1 ASN.1 data used to create Release 120.0
Uncompressed, the Release 120.0 flatfiles require roughly 36464 MB
(sequence files only) or 40709 MB (including the 'index' files). The
ASN.1 version requires roughly 32594 MB. From the release notes:
Release Date Base Pairs Entries
119 Aug 2000 9545724824 8214339
120 Oct 2000 10335692655 9102634
Close-of-data was 10/11/2000. Seven days were required to prepare this
release. In the eight-week period between close-of-data for GenBank 119.0
and GenBank 120.0, GenBank grew by 0.790 billion basepairs and 888,295
sequence records, breaking the 10 Gbp threshold. The most recent doubling
of the database's size has occurred in less than ten months.
For additional release information, see the README files in either of the
directories mentioned above, and the release notes (gbrel.txt) in the
genbank directory. Sections 1.3 and 1.4 of the release notes (Changes in
Release 120.0 and Upcoming Changes) have been appended below.
*NOTE* The gbrod.seq data file is at 248 MB in this release, so it is
likely to be split into two pieces for GenBank 121.0 . This wasn't noticed
in time for inclusion in the Upcoming Changes section of the release notes.
Release 120.0 data are currently available via NCBI's Entrez and Blast
servers, and the 'query' email server.
New GenBank cumulative update files (gbcu.flat.Z and gbcu.aso.Z), containing
only those entries new/updated since the Release 120.0 close-of-data, should be
available by 10:00am EDT, October 19. Please note that the new CUs will be
smaller than previous versions you might have obtained after Release 119.0 was
If you encounter problems while ftp'ing or uncompressing Release 120.0,
please send email outlining your difficulties to info at ncbi.nlm.nih.gov .
Mark Cavanaugh, Vladimir Alekseyev, Anton Butanaev
1.3 Important Changes in Release 120.0
1.3.1 Organizational changes
Due to database growth, the EST division is now being split into eighty-seven
Due to database growth, the GSS division is now being split into twenty-eight
Due to database growth, the HTG division is now being split into twenty-four
Due to database growth, the PRI division is now being split into eight pieces.
Due to database growth, the PAT division is now being split into two pieces:
gbpat1.seq and gbpat2.seq .
1.3.2 Order of entries in the Short Directory file has changed
The gbsdr.txt file which accompanies GenBank releases contains the
DEFINITION line and number of bases for every sequence. For historical
reasons, a specific division ordering was used for the sections of this
file. This order began with:
and ended with:
SEQUENCE TAGGED SITE
GENOME SURVEY SEQUENCE
HIGH THROUGHPUT GENOMIC SEQUENCING
Starting with GenBank Release 120.0, the ordering of the sections
of this file is now determined solely by the sort-order of the
GenBank division codes. For example:
1.3.3 The gbaut.idx file has been split into multiple pieces
The gbaut.idx file exceeded 3 GB in size for GenBank 119.0, so it has
been split into seven pieces of approximately 500 MB each:
1.4 Upcoming Changes
1.4.1 NCBI's ftp address will be changed
At some point in the near future NCBI's ftp address will be changed.
The current address:
Additional details about this change will be made available via these
release notes and the GenBank newsgroup (bionet.molbio.genbank) as they
1.4.2 Selenocysteine representation
Selenocysteine residues within the protein translations of coding
region features have been represented in GenBank via the letter 'X'
and a /transl_except qualifier. At the May 1999 DDBJ/EMBL/GenBank
collaborative meeting, it was learned that IUPAC plans to adopt the
letter 'U' for selenocysteine.
DDBJ, EMBL, and GenBank will thus use this new amino acid abbreviation
for its /translation qualifiers. Although a timetable for its appearance
has not been finalized, we are mentioning this now because the introduction
of a new residue abbreviation is a fairly fundamental change.
Details about the use of 'U' will be made available via these release
notes and the GenBank newsgroup as they become available.
1.4.3 New REFERENCE type for on-line journals
Agreement was reached at the May 1999 collaborative DDBJ/EMBL/GenBank
meeting that an effort should be made to accomodate references which are
published only on-line. Until specifications for such references are
available from library organizations, GenBank will present them in a manner
REFERENCE 1 (bases 1 to 2858)
AUTHORS Smith, J.
TITLE Cloning and expression of a phospholipase gene
JOURNAL Online Publication
REMARK Online-Journal-name; Article Identifier; URL
This format is still tentative; additional information about this new
reference type will be made available via these release notes.
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb
- GenBank on the WWW, see: http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at cmmt.ubc.ca
More information about the Genbankb