Release 42 of the EMBL Nucleotide Sequence Database (March 1995) is built and
installed on the EBI's e-mail, anonymous FTP, FASTA and WWW servers. It is also
installed already at some EMBnet sites.
Some extracts from the release notes are appended below.
Regards,
Peter Stoehr
EMBL - EBI
-----------
1 RELEASE 42
The EMBL nucleotide sequence database was frozen to make Release 42 on 8th March
1995. The release contains 303206 sequence entries comprising 262,559,786
nucleotides. This represents an increase of about 16% over Release 41. A
breakdown of Release 42 by taxonomic division is shown below:
Division Entries Nucleotides
----------------- ------- ------------
Bacteriophage 1066 1493417
ESTs 123526 39332522
Fungi 8420 19940449
Invertebrates 13831 27610495
Organelles 8195 9364254
Other Mammals 6272 6976315
Other Vertebrates 7041 8144622
Plants 11105 14145431
Primates 35290 36665648
Prokaryotes 21427 37074154
Rodents 23626 26850022
STSs 7232 2288477
Synthetic 8597 4295284
Unclassified 6082 3577630
Viruses 21496 24801066
----------------- ------- ------------
Total 303206 262559786
plus:
Other patents 6686 2507063
----------------- ------- ------------
Grand Total 309892 265066849
1.1 Literature Reference Identifiers
As previously announced, we have introduced at this release identifiers into
journal references. We have created a new RX line-type, which will be optional
for any reference in the database, with the following format:
RX database_name; identifier.
e.g.
RN [1]
RP 1-549
RX MEDLINE; 82196900.
RA Hennighausen L.G., Sippel A.E.;
RT "Mouse whey acidic protein is a novel member of the family of
RT 'four-disulfide core' proteins";
RL Nucleic Acids Res. 10:2677-2684(1982).
In this release, there are links to over 50,000 MEDLINE records.
2 FORTHCOMING CHANGES
2.1 Accession Numbers
It will soon be necessary to extend the range of possible accession numbers
available for the nucleotide sequence databases. This will be an important
topic for an upcoming collaborative meeting between EMBL, GenBank and DDBJ, and
we do not wish to preempt the result of that discussion. Inevitably though,
there must be a change to the present structure of accession numbers which
consist of one prefix letter followed by 5 digits (eg X12399), and it is very
likely that accession numbers will become longer and contain more prefix
letters. Existing accession numbers will remain valid as is. We will announce
such a significant change as widely, and with as much advance notice, as
possible.
2.2 EST Database Divisions
The number of EST sequences is growing rapidly and will continue to do so for
some time. In order to keep the size of the data files within reasonable limits
for handling purposes, we propose to split the EST division into several files
named EST1.DAT, EST2.DAT etc at the next release (Release 43, June 1995).
2.3 Feature Identifiers
We are investigating ways of assigning unique identifiers to sequence features
described within the Feature Table. We will initially focus on CDS features to
enable a finer level of cross-referencing than at present between the nucleotide
and protein sequence databases. We are hoping to adopt a common approach with
our collaborators at DDBJ and GenBank.