GenBank Release 122.0 Now Available

Mark Cavanaugh cavanaug at
Fri Feb 23 17:01:06 EST 2001

Greetings GenBank Users,

  GenBank Release 122.0 is now available via ftp from the National Center
for Biotechnology Information:

  Ftp Site           Directory   Contents
  ----------------   ---------   ---------------------------------------   genbank     GenBank Release 122.0 flatfiles
                     ncbi-asn1   ASN.1 data used to create Release 122.0

  Uncompressed, the Release 122.0 flatfiles require roughly 42698 MB
(sequence files only) or 47573 MB (including the 'index' files).  The
ASN.1 version requires roughly 38419 MB. From the release notes:

   Release  Date       Base Pairs   Entries

   121      Dec 2000   11101066288  10106023
   122      Feb 2001   11720120326  10896781

  Close-of-data was 02/13/2001. Seven business days were required to prepare
this release. In the eight-week period between close-of-data for GenBank 121.0
and GenBank 122.0, GenBank grew by 0.619 billion basepairs and 790,758
sequence records, and the number of human bases broke the 7 Gbp threshold.

  We would like to remind our users that a GenBank mirror site is
available at . Please consider using this site
in order to speed up your transfer of GenBank releases.

  For additional release information, see the README files in either of the
directories mentioned above, and the release notes (gbrel.txt) in the
genbank directory. Sections 1.3 and 1.4 of the release notes (Changes in
Release 122.0 and Upcoming Changes) have been appended below.

  Release 122.0 data are currently available via NCBI's Entrez and Blast
servers, and the 'query' email server.

  New GenBank cumulative update files (gbcu.flat.Z and gbcu.aso.Z), containing
only those entries new/updated since the Release 122.0 close-of-data, should be
available by 07:00am EST, February 24. Please note that the new CUs will be
smaller than previous versions you might have obtained after Release 121.0 was

  If you encounter problems while ftp'ing or uncompressing Release 122.0,
please send email outlining your difficulties to info at .

Mark Cavanaugh, Vladimir Alekseyev, Anton Butanaev

1.3 Important Changes in Release 122.0

1.3.1 Organizational changes

  Due to database growth, the EST division is now being split into 106 pieces.

  Due to database growth, the GSS division is now being split into 34 pieces.

  Due to database growth, the HTG division is now being split into 25 pieces.

  Due to database growth, the STS division is now being split into 3 pieces.

1.3.2 Alternative GenBank FTP site

  A mirror of the GenBank FTP site at the NCBI is available from the San Diego
Supercomputer Center:

  Some users who experience slow FTP transfers of large files (entire releases, 
GenBank Cumulative Update, etc) might find an improvement in transfer rates from
this alternate site when traffic at the NCBI is high.

1.4 Upcoming Changes

1.4.1 New HTC division to be introduced

  A new GenBank division for unfinished high-throughput cDNA sequencing (HTC)
will be included in GenBank releases starting in April 2001 (Release 123.0).
HTC sequences may have 5'UTR and 3'UTR at their ends, partial coding regions,
and introns. A keyword of "HTC" will be present, in addition to division code
"HTC". Those HTC sequences that undergo finishing (eg, re-sequencing) will move
to the appropriate taxonomic GenBank division and the "HTC" keyword will be
removed. A recent project that generates HTC-quality data is described in:

	Hayashizaki, Y.
	Functional annotation of a full-length mouse cDNA collection
	Nature 409, 685-690 (2001)

1.4.2 Minor change to REFERENCE line

  The REFERENCE keyword for the literature citations associated with a GenBank
record currently requires a parenthetical component indicating either the
basepair span to which the citation applies, or "sites" for citations providing
annotation rather than sequence data. Here are some examples:

	REFERENCE   1  (bases 1 to 262290)
	REFERENCE   2  (sites)
	REFERENCE   3  (bases 1 to 456; bases 700 to 2334)

  In some cases, sequence updates provided by submittors can involve a large
number of changes. And sometimes, a submittor does not wish to indicate
exactly _which_ basepair spans are involved. Accordingly, we will change the
definition of the REFERENCE line to make the parenthetical component an
optional element as of GenBank Release 123.0 (April 2001).

1.4.3 NCBI's ftp address will be changed

  At some point in the near future NCBI's ftp address will be changed.
The current address:

will become:

  Additional details about this change will be made available via these
release notes and the GenBank newsgroup (bionet.molbio.genbank) as they
become available.

1.4.4 Selenocysteine representation

  Selenocysteine residues within the protein translations of coding
region features have been represented in GenBank via the letter 'X'
and a /transl_except qualifier. At the May 1999 DDBJ/EMBL/GenBank
collaborative meeting, it was learned that IUPAC plans to adopt the
letter 'U' for selenocysteine.

  DDBJ, EMBL, and GenBank will thus use this new amino acid abbreviation
for its /translation qualifiers. Although a timetable for its appearance
has not been finalized, we are mentioning this now because the introduction
of a new residue abbreviation is a fairly fundamental change.

  Details about the use of 'U' will be made available via these release
notes and the GenBank newsgroup as they become available.

1.4.5 New REFERENCE type for on-line journals

  Agreement was reached at the May 1999 collaborative DDBJ/EMBL/GenBank
meeting that an effort should be made to accomodate references which are
published only on-line. Until specifications for such references are
available from library organizations, GenBank will present them in a manner
like this:

	REFERENCE   1  (bases 1 to 2858)
	  AUTHORS   Smith, J.
	  TITLE     Cloning and expression of a phospholipase gene
	  JOURNAL   Online Publication
	  REMARK    Online-Journal-name; Article Identifier; URL

  This format is still tentative; additional information about this new
reference type will be made available via these release notes.


- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
- GenBank newsgroup see:       
- GENBANKB e-mail: messages sent to genbankb at
- subscribe: e-mail biosci-server at with: subscribe genbankb
- unsub: e-mail biosci-server at with: unsubscribe genbankb      
- GenBank on the WWW, see:
- problems with GENBANKB? E-mail moderator: francis at                  

More information about the Genbankb mailing list