New LOCUS Format : Sample Flatfiles

cavanaug at ncbi.nlm.nih.gov cavanaug at ncbi.nlm.nih.gov
Tue Oct 16 18:28:38 EST 2001


Greetings GenBank Users,

A new LOCUS line format will be utilized for GenBank-format flatfiles
starting with GenBank Release 127.0 in December 2001 .

The details of the new format are documented in Section 1.4 of the
GenBank release notes:

   ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt

and are appended for your reference at the end of this message.

Three sample flatfiles, containing a large number of the entries
present in the 10/12/2001 GenBank Cumulative Update, and which utilize
the new LOCUS format, are now available at the NCBI FTP site for
purposes of testing:

   ftp://ftp.ncbi.nih.gov/genbank/new_locus_format/new_locus.1.gbff.gz
   ftp://ftp.ncbi.nih.gov/genbank/new_locus_format/new_locus.2.gbff.gz
   ftp://ftp.ncbi.nih.gov/genbank/new_locus_format/new_locus.3.gbff.gz

We have parsed these files and believe that the new LOCUS lines are
correctly formatted. But if you encounter any problems, please inform our
Service Desk:

   info at ncbi.nlm.nih.gov

Mark Cavanaugh
GenBank
NCBI/NLM/NIH


New LOCUS format:

---------+---------+---------+---------+---------+---------+---------+---------
1       10        20        30        40        50        60        70       79
LOCUS       16Char_LocusName 99999999999 bp ss-snoRNA  circular DIV DD-MMM-YYYY

Positions  Contents
---------  --------
01-05      LOCUS
06-12      spaces
13-28      Locus name
31-31      space
30-40      Length of sequence, right-justified
41-41      space
42-43      bp
44-44      space
45-47      spaces, ss- (single-stranded), ds- (double-stranded), or
           ms- (mixed-stranded)
48-53      NA, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA), 
           mRNA (messenger RNA), uRNA (small nuclear RNA), snRNA,
           snoRNA, scRNA. Left justified.
54-55      space
56-63      'linear' followed by two spaces, or 'circular'
64-64      space
65-67      The division code (see Section 3.3)
68-68      space
69-79      Date, in the form dd-MMM-yyyy (e.g., 15-MAR-1991)

  Here's how two existing records will appear using this new format:

LOCUS       AB000383                5423 bp    DNA     circular VRL 05-FEB-1999
DEFINITION  Leucania seperata nuclear polyhedrosis virus DNA for p13, xe,
            envelope protein, complete cds.
ACCESSION   AB000383

LOCUS       AF345888                 147 bp ss-RNA     linear   VRL 21-JUN-2001
DEFINITION  Chikungunya virus nonstructural protein 4 gene, partial cds.
ACCESSION   AF345888


	

---



- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
-
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/       
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb      
- GenBank on the WWW, see:  http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at cmmt.ubc.ca                  





More information about the Genbankb mailing list