EMBL Release 42 built

Peter Stoehr stoehr at ebi.ac.uk
Wed Mar 22 06:34:38 EST 1995


Release 42 of the EMBL Nucleotide Sequence Database (March 1995) is built and
installed on the EBI's e-mail, anonymous FTP, FASTA and WWW servers. It is also
installed already at some EMBnet sites.

Some extracts from the release notes are appended below.

Regards,
Peter Stoehr
EMBL - EBI
-----------

1  RELEASE 42

The EMBL nucleotide sequence database was frozen to make Release 42 on 8th March
1995.   The  release  contains  303206  sequence  entries comprising 262,559,786
nucleotides.  This represents an increase of  about  16%  over  Release  41.   A
breakdown of Release 42 by taxonomic division is shown below:

                  Division             Entries    Nucleotides
                  -----------------    -------    ------------
                  Bacteriophage           1066         1493417
                  ESTs                  123526        39332522
                  Fungi                   8420        19940449
                  Invertebrates          13831        27610495
                  Organelles              8195         9364254
                  Other Mammals           6272         6976315
                  Other Vertebrates       7041         8144622
                  Plants                 11105        14145431
                  Primates               35290        36665648
                  Prokaryotes            21427        37074154
                  Rodents                23626        26850022
                  STSs                    7232         2288477
                  Synthetic               8597         4295284
                  Unclassified            6082         3577630
                  Viruses                21496        24801066
                  -----------------    -------    ------------
                  Total                 303206       262559786

                  plus:
                  Other patents           6686         2507063
                  -----------------    -------    ------------
                  Grand Total           309892       265066849



1.1  Literature Reference Identifiers

As previously announced, we have introduced at  this  release  identifiers  into
journal  references.  We have created a new RX line-type, which will be optional
for any reference in the database, with the following format:

RX   database_name; identifier.

e.g.

RN   [1]
RP   1-549
RX   MEDLINE; 82196900.
RA   Hennighausen L.G., Sippel A.E.;
RT   "Mouse whey acidic protein is a novel member of the family of
RT   'four-disulfide core' proteins";
RL   Nucleic Acids Res. 10:2677-2684(1982).

In this release, there are links to over 50,000 MEDLINE records.


2  FORTHCOMING CHANGES

2.1  Accession Numbers

It will soon be necessary to extend the  range  of  possible  accession  numbers
available  for  the  nucleotide  sequence  databases.  This will be an important
topic for an upcoming collaborative meeting between EMBL, GenBank and DDBJ,  and
we  do  not  wish  to preempt the result of that discussion.  Inevitably though,
there must be a change to the  present  structure  of  accession  numbers  which
consist  of  one  prefix letter followed by 5 digits (eg X12399), and it is very
likely that accession  numbers  will  become  longer  and  contain  more  prefix
letters.   Existing accession numbers will remain valid as is.  We will announce
such a significant change as  widely,  and  with  as  much  advance  notice,  as
possible.


2.2  EST Database Divisions

The number of EST sequences is growing rapidly and will continue to  do  so  for
some time.  In order to keep the size of the data files within reasonable limits
for handling purposes, we propose to split the EST division into  several  files
named EST1.DAT, EST2.DAT etc at the next release (Release 43, June 1995).

2.3  Feature Identifiers

We are investigating ways of assigning unique identifiers to  sequence  features
described  within the Feature Table.  We will initially focus on CDS features to
enable a finer level of cross-referencing than at present between the nucleotide
and  protein  sequence databases.  We are hoping to adopt a common approach with
our collaborators at DDBJ and GenBank.



More information about the Embl-db mailing list