Death of the GenBank floppy format?

David Kristofferson kristoff at genbank.bio.net
Thu Aug 1 13:05:49 EST 1991


>      I'm confused/concerned about an announcement I just received from
> IG Inc. announcing the imminent demise of the GenBank floppy-disk distribu-
> tion.  Does that mean that the GenBank compressed-file format is going to
> bite the dust also?  I get GenBank on CD-ROM and so the appended note in
> the announcement struck me especially:
> 
>      > Note to our CD-ROM customers:  CD's will be distributed quarterly
>      > through September 1992.  The April 1992 release 71.0 floppy disk
>      > format files will be included on each CD released after April.
> 
> Perhaps I'm reading this wrongly but it seems to mean that release 71 will
> be the last updated version of the compressed files, and all CD-ROM releases
> subsequent to that will contain the latest flat-file release, plus yet
> another rerelease of the r.71 compressed files.

Yes, your "translation" is correct.  The April 1992 release (71) is
the LAST release of the floppy disk format files, compressed format
files, binary files, or whatever other name you wish to call them by.
CDROMs for releases 72 in June and 73 in September will contain the
latest tape format (i.e., ASCII) flat file release, a repeat of floppy
format release 71, and the latest available GenPept data in ASCII tape
format.

This decision was made at the June joint GenBank/NCBI advisors meeting
after first consulting NIGMS, NCBI, and the one commercial developer
that uses the format.  There is no doubt that some people will find
this change disconcerting, as always happens with any change, but the
time was ripe to "bite the bullet on this" as advancing technology is
making continued support of this format far less attractive than other
options.

NCBI, the National Center for Biotechnology Information at the
National Library of Medicine in Bethesda (and the party responsible
for the future of GenBank from October 1992 on), has held developers'
meetings over the last year or so and now has a CDROM in beta release
as was announced by Dennis Benson recently on the
BIO-SOFTWARE/bionet.software newsgroup.

If these issues concern you, you should ***MAKE SURE THAT YOU STAY
INFORMED*** about developments at NCBI.  I also invite NCBI to utilize
the BIOSCI newsgroups to elaborate further on their plans.  Although
NCBI has maintained a mailing list, bits at bio.nlm.nih.gov, less than 40
messages have been posted to that forum in the last two years.
BIO-SOFTWARE/bionet.software is a widely read international forum and
would be a more effective vehicle for communication.  I am sure,
however, that despite all good efforts at public education there will
still be many who are caught completely unawares by impending changes
8-(.  Hopefully the more messages that are sent out, the smaller the
number of "surprises" will be.


> I'm reading this with a 
> logic text on my left (Lewis Carroll's) and my _Practical English Handbook_
> on my right.  Does anyone else read the announcement differently?

Perhaps the confusion is the result of reading a letter produced by
committee, but it apparently wasn't that bad because you succeeded in
interpreting it 8-).

>      If the compressed GenBank files really are going away anyone who uses
> them due to being short of disk space will have to make arrangements for
> someone to ZIP or otherwise compress each GenBank release for them, and
> also modify their favorite search/comparison software to be able to open and
> read ZIP (or whatever) files.  I foresee a proliferation of incompatible
> homemade compressed GenBanks, and Murphy sayeth that whatever you have on
> hand will NOT be what your slick new piece of software wants.

The "compressed" files are NOT ZIPed or "compressed" by any widely
available compression utility.  There is no possibility of "multiple
compressions" being produced by utility programs.  That is why I use
the term "floppy format" files instead of "compressed" format because
this is a special binary format created by software at GenBank.  This
software was inherited from the first contractor in 1987 and has been
patched and revised many times as the database grew and "broke" the
the code.  The code is not worth maintaining and should probably be
completely rewritten *if* resources were to be devoted to this.
However, NCBI has other format plans which make far more sense than a
continuation of this obsolete format.  My advice to any developer who
has not yet done so is to contact NCBI and find out what the future
holds before embarking on further developments.  Instead of trying to
rewrite programs to use a suboptimal format, your efforts should be
devoted to rewriting your code to read what NCBI plans for the future.
GenBank ASCII format will continue to be supported by NCBI, but there
are other plans as well.  Since I am not a spokesperson for NCBI,
either they should elaborate on this further in this forum or else
individuals should get their information by contacting NCBI.  It would
seem to be more efficient to do the former.

>      For testing purposes I use both versions of GenBank on this system.
> FYI, the compressed files on the release 67 CD-ROM take up 50.47 megabytes and
> the uncompressed ASCII files occupy 145.79 megabytes.

The files are NOT simply related by compression, once again.  The
floppy format files have many of the annotation lines omitted in
addition to being binary files.

				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff at genbank.bio.net



More information about the Bioforum mailing list