We would like to announce the following important change in the EMBL
database in June this year.
At the time of release 87 (available from JUN-2006) the format of the
EMBL flat file will undergo a change: the ID line will have a different
structure (see below) and the SV line will be removed.
The changes affecting the ID line structure are:
* All tokens will be separated by a semicolon.
* The entry name will not be displayed, in its place there will be
the primary accession number.
* The sequence version will be indicated.
* The topology will be a separate token and will be indicated for
both circular and linear molecules.
* Both the data class and the taxonomic divisions will be displayed.
This is an example of the new ID line:
ID CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP.
(1) (2) (3) (4) (5) (6) (7)
The tokens represent:
1. Primary accession number.
2. 'SV' + sequence version number.
3. Topology: 'circular' or 'linear'.
4. Molecule type.
5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA,
STS, STD, "normal" entries will have STD for standard).
6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV,
INV, SYN, UNC, VRL, PHG)."
7. Sequence length + 'BP.'.
The entry name will not be displayed any more in the ID line. Since EMBL
release 3 (Dec 1983) the stable identifier of an entry has been the
primary accession number.
A mapping file (entryname to accession number) will be provided with the
next release for those entries where the entryname doesn't coincide with
the accession number.
To give users a test dataset, one file with new-style ID lines called
new_id_line.test.gz was provided together with the March release of the
Feedback from users is sought; please use the "Contact us" link at the
bottom of the EBI home page and specify "EMBL" in the feedback form.
Note: this information was first made available on our
"Forthcoming changes" page (
and in the EMBL database release notes.