EMBL Release 34

stoehr at embl-heidelberg.de stoehr at embl-heidelberg.de
Sat Apr 10 16:17:11 EST 1993


Release 34 of the EMBL Nucleotide Seqauence Database (March 1993) is
being distributed on CD-ROM and magnetic tape. All CD's should now be on
their way. I attach some extracts from the release notes concerning upcoming
changes, namely:
 - new EST division, UNA becomes UNC (unclassified)
 - circular molecules to be indicated
 - sequence numbering introduced
 - feature table changes, new 'source' key replaces several others.

Changes To Database Divisions

Input from our user community indicates that a separation of "EST" data from the
main data collection is advisable.  In Release 35 in June 1993 we will introduce
a new database division called EST  to  include  all  sequences  marked  by  the
keywords  "expressed  sequence  tag"  or "transcribed sequence fragment".  These
sequences are often determined by single-strand, single-read sequencing  methods
only  and generally include no or only marginal annotation.  The creation of the
new EST division will enable users of our database to easily exclude  this  kind
of data from their analyses, if so desired.

The "Unclassified" division will be renamed from UNA to UNC to reflect the  fact
that it contains sequences which are not taxonomically classified.


Molecule Topology

As previously announced we  will  indicate  molecules  which  are  known  to  be
circular  by  prefixing  the  molecule  type  on  the  ID  line with the keyword
"circular" as from Release 35 in June 1993.
Please note that the absence of the keyword "circular" should *not* be  taken 
to  indicate  that  the  molecule  is definitely known not to be circular.

An example of such a circular molecule's ID line is shown below:

     ID   CLSPC1     standard; circular DNA; ROD; 346 BP.


Sequence Numbering

To aid reading the sequence bases in  database  entries,  we  will  insert  base
numbers  in columns 73-80 of each sequence line as from Release 35 in June 1993.
The numbers will be right justified, and will indicate the number  of  the  last
base on each line.  An example is shown below (the ruler is for your convenience
and will not appear in the database entries):

1       10        20        30        40        50        60        70        80
+--------+---------+---------+---------+---------+---------+---------+---------+
SQ   Sequence  245 BP; 60 A; 44 C; 77 G; 64 T; 0 other;
     agatcttctg ctcccaggag agagagcaat gtctagagta gggaaaagga ccatcttagc        60
     cctctactat aggcagctgt ctgctacccg tcactcacca atgggagagg aggcatgggt       120
     attgtgttca gatggggccc agtgttattt atttgagact ggatcagggt gagaacttga       180
     ggggaagggt tggagtagaa ggttatgatc tttctagaca gtgctgcatt ggtggcttga       240
     ctgac                                                                   245
//
+--------+---------+---------+---------+---------+---------+---------+---------+
1       10        20        30        40        50        60        70        80


Feature Table Changes

The following changes to the common  DDBJ/EMBL/GenBank  feature  table  will  be
implemented  on  April 1st 1993 and will therefore be reflected at Release 35 in
June 1993.  They are more fully documented in a document "The  DDBJ/EMBL/GenBank
Feature  Table:   Definition" which is available either in printed form from the
EMBL Data Library, or electronically as a compressed postscript  file  from  our
anonymous FTP server:

     FTP.EMBL-Heidelberg.DE                     (Internet address)
     /pub/databases/embl/doc/FTv1.04.ps.Z       (file name)

Discontinued feature keys (superceded by "source" key):

    provirus, cellular, transposon, insertion_seq.

New feature keys:

    source        identifies the biological source of the specified span of
                  the sequence.

    STS           Sequence Tagged Site

    V_segment     variable segment of immunoglobulin light and heavy chains,
                  and T-cell receptor alpha, beta and gamma chains.

    D_segment     diversity segment of immunoglobulin heavy chain, and T-cell
                  receptor beta chain.

    J_segment     joining segment of immunoglobulin light and heavy chains,
                  and T-cell receptor alpha, beta and gamma chains.

    S_region      switch region of immunoglobulin heavy chains.

    N_region      extra nucleotides inserted between rearranged immunoglobulin
                  segments.

    C_region      constant region of immunoglobulin light and heavy chains,
                  and T-cell receptor alpha, beta and gamma chains.

    V_region      variable region of immunoglobulin light and heavy chains,
                  and T-cell receptor alpha, beta and gamma chains.



More information about the Embl-db mailing list