The following are some extracts from the release notes of release 33 of the
EMBL Nucleotide Sequence Database, December 1992. Comments are welcome, either
via the BIOSCI newsgroup bionet.molbio.embldatabank, or by e-mail to
datalib at embl-heidelberg.de
--------------------------
A breakdown of Release 33 by taxonomic division is shown below:
Division Entries Nucleotides
----------------- ------- -----------
Bacteriophage 825 1077956
Fungi 3675 6948386
Invertebrates 9014 11077413
Organelles 2925 4272437
Other Mammals 3075 4049195
Other Vertebrates 3859 4652366
Plants 5369 6681381
Primates 22165 21151438
Prokaryotes 10926 18322905
Rodents 14189 15963099
Synthetic 1309 1096823
Unclassified 2658 2393182
Viruses 9111 13727398
----------------- ------- -----------
Total 89100 111413979
This represents an increase of about 12% from release 32.
2 FORTHCOMING CHANGES
2.1 RA Line Author Name Format
As from Release 34 in March 1993 we will change the format of author names on RA
lines to conform to that used by major bibliographic databases such as Medline.
The main change is that the periods which currently appear within initials will
not appear any more.
For example, the current:
RA Wilson A., Smith B.G.;
will then appear as:
RA Wilson A, Smith BG;
2.2 Patent Sequences
A collaboration between the EMBL Data Library and the European Patent Office
will enable us to include patent sequences in our quarterly distributions as
from Release 34 in March 1993. These sequences will each have a new patent
reference type, to document the source of the data. Patent-specific information
will appear in the RL line block (introduced by the keyword "Patent") and the
other reference linetypes (RN, RP, RC, RA, RT) will appear as usual. An example
of a patent RL line block is shown below:
RL Patent number EP0062971-A/1, 20-OCT-1982.
RL IMPERIAL CHEMICAL INDUSTRIES PLC.
RL UNIVERSITY OF LEICESTER.
The date on the first line in the RL block is the publication date of the
patent, and the following line(s) list the patent applicant(s).
2.3 Molecule Topology
As from Release 35 in June 1993 we will indicate molecules which are known to be
circular by prefixing the molecule type on the ID line with the keyword
"circular". Other topology keywords may appear in the same location at future
releases. Please note that the absence of the keyword "circular" should *not*
be taken to indicate that the molecule is definitely known not to be circular.
An example of such a circular molecule's ID line is shown below:
ID CLSPC1 standard; circular DNA; ROD; 346 BP.
2.4 Sequence Numbering
To aid reading the sequence bases in database entries, we will insert base
numbers in columns 73-80 of each sequence line as from Release 35 in June 1993.
The numbers will be right justified, and will indicate the number of the last
base on each line. An example is shown below (the ruler is for your convenience
and will not appear in the database entries):
1 10 20 30 40 50 60 70 80
+--------+---------+---------+---------+---------+---------+---------+---------+
SQ Sequence 245 BP; 60 A; 44 C; 77 G; 64 T; 0 other;
agatcttctg ctcccaggag agagagcaat gtctagagta gggaaaagga ccatcttagc 60
cctctactat aggcagctgt ctgctacccg tcactcacca atgggagagg aggcatgggt 120
attgtgttca gatggggccc agtgttattt atttgagact ggatcagggt gagaacttga 180
ggggaagggt tggagtagaa ggttatgatc tttctagaca gtgctgcatt ggtggcttga 240
ctgac 245
//
+--------+---------+---------+---------+---------+---------+---------+---------+
1 10 20 30 40 50 60 70 80