[Genbank-bb] GenBank Releases : New GenBank "catalog" (and related) files : Will replace old "index" files

Cavanaugh, Mark (NIH/NLM/NCBI) [E] via genbankb%40net.bio.net (by cavanaug from ncbi.nlm.nih.gov)
Fri Dec 21 13:18:58 EST 2012


Greetings GenBank Users,

As described in the announcement for GenBank 193.0 availability,
we are providing files which catalog the contents of a release.

The genbank/catalog directory at the NCBI FTP site now contains
these files:

gb193.catalog.est.txt.gz
gb193.catalog.gss.txt.gz
gb193.catalog.other.txt.gz
gb193.gene_list.gss.txt.gz
gb193.gene_list.other.txt.gz
gb193.pmid_list.est.txt.gz
gb193.pmid_list.gss.txt.gz
gb193.pmid_list.other.txt.gz

The format and content of these files is described in Section 1.3.4
of the GenBank 193.0 release notes (gbrel.txt).

Note that there is no gene_list file for EST, because EST records
at the NCBI are not annotated with anything other than source
features.

There is one known issue involving the Division-Code field
of the catalog : Finished sequence records that originated
in clone-based high-throughput genome sequencing (HTG) projects
have a division code of "HTG", even though those sequence 
records may have moved to (for example) the PRI division,
upon completion. We're considering a change that would make
this column contain multiple values, to reflect the fact
that a sequence can be categorized in multiple ways. For
example: "HTG,PRI" or "GSS,ENV" .

So obviously these products are still in a bit of flux.
Now would be a good time to pass along any suggestions 
that you might have for the content and structure of these
catalog, and related, files. 

Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS




More information about the Genbankb mailing list