IMPORTANT : gzip to be used for GenBank data products

Mark Cavanaugh cavanaug at
Mon Jun 12 17:14:08 EST 2000

Greetings GenBank Users,

Effective on the official release date for GenBank Release 119.0 (August 15,
2000), NCBI will begin using the gzip compression utility for all of its
GenBank products.

Comparisons of 'gzip' to the Unix 'compress' utility for simplistic sequence
data (eg, EST, GSS, STS) yielded an additional 50% reduction in the size of a
compressed file. Given that ESTs and GSS sequences comprise a huge portion of
the GenBank data NCBI distributes, switching to gzip will save a great deal
of disk space, and will reduce the amount of bandwidth utilized by those who
ftp GenBank products.

The gzip utility is available for most major operating systems, either bundled
or as freeware/shareware. So we do not expect that our switch to gzip will
cause more than minor inconveniences for our users.

One such inconvenience will be a change in file naming conventions. The suffix
of compressed GenBank data files (for both releases and GenBank Updates) is
currently ".Z" . After our switch to gzip, the suffix will become ".gz" .
For example:

	gbbct1.seq.Z -> gbbct1.seq.gz
	gbcu.flat.Z  -> gbcu.flat.gz
	nc0610.aso.Z -> nc0610.aso.gz

And of course, any automated scripts that obtain GenBank data products will
have to be modified, replacing commands like this:

	uncompress nc0610.flat.Z

	gunzip nc0610.flat.gz

If you are unsure about the availability of gzip for your platform, please
contact your system administrator. If you find that the utility is not
installed, one possible place for obtaining gzip is:

Any questions or concerns that you have about this change should be directed
to NCBI's Service Desk:

	info at

Mark Cavanaugh


- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
- GenBank newsgroup see:       
- GENBANKB e-mail: messages sent to genbankb at
- subscribe: e-mail biosci-server at with: subscribe genbankb
- unsub: e-mail biosci-server at with: unsubscribe genbankb      
- GenBank on the WWW, see:
- problems with GENBANKB? E-mail moderator: francis at                  

More information about the Genbankb mailing list