EMBL rel. 33 release notes (extracts)

stoehr at embl-heidelberg.de stoehr at embl-heidelberg.de
Fri Dec 18 09:50:49 EST 1992

The following are some extracts from the release notes of release 33 of the
EMBL Nucleotide Sequence Database, December 1992. Comments are welcome, either
via the BIOSCI newsgroup bionet.molbio.embldatabank, or by e-mail to
datalib at embl-heidelberg.de

A breakdown of Release  33 by taxonomic division is shown below:

                  Division             Entries    Nucleotides
                  -----------------    -------    -----------
                  Bacteriophage            825        1077956
                  Fungi                   3675        6948386
                  Invertebrates           9014       11077413
                  Organelles              2925        4272437
                  Other Mammals           3075        4049195
                  Other Vertebrates       3859        4652366
                  Plants                  5369        6681381
                  Primates               22165       21151438
                  Prokaryotes            10926       18322905
                  Rodents                14189       15963099
                  Synthetic               1309        1096823
                  Unclassified            2658        2393182
                  Viruses                 9111       13727398
                  -----------------    -------    -----------
                  Total                  89100      111413979

This represents an increase of about 12% from release 32.


2.1  RA Line Author Name Format

As from Release 34 in March 1993 we will change the format of author names on RA
lines  to conform to that used by major bibliographic databases such as Medline.
The main change is that the periods which currently appear within initials  will
not appear any more.

For example, the current:

     RA   Wilson A., Smith B.G.;

will then appear as:

     RA   Wilson A, Smith BG;

2.2  Patent Sequences

A collaboration between the EMBL Data Library and  the  European  Patent  Office
will  enable  us  to  include patent sequences in our quarterly distributions as
from Release 34 in March 1993.  These sequences will  each  have  a  new  patent
reference type, to document the source of the data.  Patent-specific information
will appear in the RL line block (introduced by the keyword  "Patent")  and  the
other reference linetypes (RN, RP, RC, RA, RT) will appear as usual.  An example
of a patent RL line block is shown below:

     RL   Patent number EP0062971-A/1, 20-OCT-1982.

The date on the first line in the RL  block  is  the  publication  date  of  the
patent, and the following line(s) list the patent applicant(s).

2.3  Molecule Topology

As from Release 35 in June 1993 we will indicate molecules which are known to be
circular  by  prefixing  the  molecule  type  on  the  ID  line with the keyword
"circular".  Other topology keywords may appear in the same location  at  future
releases.   Please  note that the absence of the keyword "circular" should *not*
be taken to indicate that the molecule is definitely known not to be circular.

An example of such a circular molecule's ID line is shown below:

     ID   CLSPC1     standard; circular DNA; ROD; 346 BP.

2.4  Sequence Numbering

To aid reading the sequence bases in  database  entries,  we  will  insert  base
numbers  in columns 73-80 of each sequence line as from Release 35 in June 1993.
The numbers will be right justified, and will indicate the number  of  the  last
base on each line.  An example is shown below (the ruler is for your convenience
and will not appear in the database entries):

1       10        20        30        40        50        60        70        80
SQ   Sequence  245 BP; 60 A; 44 C; 77 G; 64 T; 0 other;
     agatcttctg ctcccaggag agagagcaat gtctagagta gggaaaagga ccatcttagc        60
     cctctactat aggcagctgt ctgctacccg tcactcacca atgggagagg aggcatgggt       120
     attgtgttca gatggggccc agtgttattt atttgagact ggatcagggt gagaacttga       180
     ggggaagggt tggagtagaa ggttatgatc tttctagaca gtgctgcatt ggtggcttga       240
     ctgac                                                                   245
1       10        20        30        40        50        60        70        80

