Greetings GenBank Users,
A new LOCUS line format will be utilized for GenBank-format flatfiles
starting with GenBank Release 127.0 in December 2001 .
The details of the new format are documented in Section 1.4 of the
GenBank release notes:
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
and are appended for your reference at the end of this message.
Three sample flatfiles, containing a large number of the entries
present in the 10/12/2001 GenBank Cumulative Update, and which utilize
the new LOCUS format, are now available at the NCBI FTP site for
purposes of testing:
ftp://ftp.ncbi.nih.gov/genbank/new_locus_format/new_locus.1.gbff.gzftp://ftp.ncbi.nih.gov/genbank/new_locus_format/new_locus.2.gbff.gzftp://ftp.ncbi.nih.gov/genbank/new_locus_format/new_locus.3.gbff.gz
We have parsed these files and believe that the new LOCUS lines are
correctly formatted. But if you encounter any problems, please inform our
Service Desk:
info at ncbi.nlm.nih.gov
Mark Cavanaugh
GenBank
NCBI/NLM/NIH
New LOCUS format:
---------+---------+---------+---------+---------+---------+---------+---------
1 10 20 30 40 50 60 70 79
LOCUS 16Char_LocusName 99999999999 bp ss-snoRNA circular DIV DD-MMM-YYYY
Positions Contents
--------- --------
01-05 LOCUS
06-12 spaces
13-28 Locus name
31-31 space
30-40 Length of sequence, right-justified
41-41 space
42-43 bp
44-44 space
45-47 spaces, ss- (single-stranded), ds- (double-stranded), or
ms- (mixed-stranded)
48-53 NA, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA),
mRNA (messenger RNA), uRNA (small nuclear RNA), snRNA,
snoRNA, scRNA. Left justified.
54-55 space
56-63 'linear' followed by two spaces, or 'circular'
64-64 space
65-67 The division code (see Section 3.3)
68-68 space
69-79 Date, in the form dd-MMM-yyyy (e.g., 15-MAR-1991)
Here's how two existing records will appear using this new format:
LOCUS AB000383 5423 bp DNA circular VRL 05-FEB-1999
DEFINITION Leucania seperata nuclear polyhedrosis virus DNA for p13, xe,
envelope protein, complete cds.
ACCESSION AB000383
LOCUS AF345888 147 bp ss-RNA linear VRL 21-JUN-2001
DEFINITION Chikungunya virus nonstructural protein 4 gene, partial cds.
ACCESSION AF345888
---
- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
-
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb
- GenBank on the WWW, see: http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at cmmt.ubc.ca