Release 34 of the EMBL Nucleotide Seqauence Database (March 1993) is
being distributed on CD-ROM and magnetic tape. All CD's should now be on
their way. I attach some extracts from the release notes concerning upcoming
changes, namely:
- new EST division, UNA becomes UNC (unclassified)
- circular molecules to be indicated
- sequence numbering introduced
- feature table changes, new 'source' key replaces several others.
Changes To Database Divisions
Input from our user community indicates that a separation of "EST" data from the
main data collection is advisable. In Release 35 in June 1993 we will introduce
a new database division called EST to include all sequences marked by the
keywords "expressed sequence tag" or "transcribed sequence fragment". These
sequences are often determined by single-strand, single-read sequencing methods
only and generally include no or only marginal annotation. The creation of the
new EST division will enable users of our database to easily exclude this kind
of data from their analyses, if so desired.
The "Unclassified" division will be renamed from UNA to UNC to reflect the fact
that it contains sequences which are not taxonomically classified.
Molecule Topology
As previously announced we will indicate molecules which are known to be
circular by prefixing the molecule type on the ID line with the keyword
"circular" as from Release 35 in June 1993.
Please note that the absence of the keyword "circular" should *not* be taken
to indicate that the molecule is definitely known not to be circular.
An example of such a circular molecule's ID line is shown below:
ID CLSPC1 standard; circular DNA; ROD; 346 BP.
Sequence Numbering
To aid reading the sequence bases in database entries, we will insert base
numbers in columns 73-80 of each sequence line as from Release 35 in June 1993.
The numbers will be right justified, and will indicate the number of the last
base on each line. An example is shown below (the ruler is for your convenience
and will not appear in the database entries):
1 10 20 30 40 50 60 70 80
+--------+---------+---------+---------+---------+---------+---------+---------+
SQ Sequence 245 BP; 60 A; 44 C; 77 G; 64 T; 0 other;
agatcttctg ctcccaggag agagagcaat gtctagagta gggaaaagga ccatcttagc 60
cctctactat aggcagctgt ctgctacccg tcactcacca atgggagagg aggcatgggt 120
attgtgttca gatggggccc agtgttattt atttgagact ggatcagggt gagaacttga 180
ggggaagggt tggagtagaa ggttatgatc tttctagaca gtgctgcatt ggtggcttga 240
ctgac 245
//
+--------+---------+---------+---------+---------+---------+---------+---------+
1 10 20 30 40 50 60 70 80
Feature Table Changes
The following changes to the common DDBJ/EMBL/GenBank feature table will be
implemented on April 1st 1993 and will therefore be reflected at Release 35 in
June 1993. They are more fully documented in a document "The DDBJ/EMBL/GenBank
Feature Table: Definition" which is available either in printed form from the
EMBL Data Library, or electronically as a compressed postscript file from our
anonymous FTP server:
FTP.EMBL-Heidelberg.DE (Internet address)
/pub/databases/embl/doc/FTv1.04.ps.Z (file name)
Discontinued feature keys (superceded by "source" key):
provirus, cellular, transposon, insertion_seq.
New feature keys:
source identifies the biological source of the specified span of
the sequence.
STS Sequence Tagged Site
V_segment variable segment of immunoglobulin light and heavy chains,
and T-cell receptor alpha, beta and gamma chains.
D_segment diversity segment of immunoglobulin heavy chain, and T-cell
receptor beta chain.
J_segment joining segment of immunoglobulin light and heavy chains,
and T-cell receptor alpha, beta and gamma chains.
S_region switch region of immunoglobulin heavy chains.
N_region extra nucleotides inserted between rearranged immunoglobulin
segments.
C_region constant region of immunoglobulin light and heavy chains,
and T-cell receptor alpha, beta and gamma chains.
V_region variable region of immunoglobulin light and heavy chains,
and T-cell receptor alpha, beta and gamma chains.