From owner-embldatabank@net.bio.net Thu Dec 03 22:00:00 1992
Path: biosci!daresbury!news
From: NETHELP@EMBL-Heidelberg.DE
Newsgroups: bionet.molbio.embldatabank
Subject: EMBL File Server Newsletter No. 9, Dec. 4th 1992
Message-ID: <1992Dec4.110447.15329@gserv1.dl.ac.uk>
Date: 4 Dec 92 11:03:17 GMT
Sender: Peter.Stoehr@DE.EMBL-Heidelberg
Distribution: bionet
Organization: European Molecular Biology Laboratory, Heidelberg
Lines: 443
Original-To: embl-db@uk.ac.daresbury
Content-Transfer-Encoding: 7BIT
Original-Sender: Peter Stoehr <Peter.Stoehr@DE.EMBL-Heidelberg>
Mime-Version: 1.0
X-Vms-To: IN%"embl-db@daresbury.ac.uk"

------------------------------------------------------------------------------
|  EMBL FILE SERVER News                         Number 9, December 4th 1992 |
|                                                                            |
|  European Molecular Biology Laboratory, Data Library & Computer Group,     |
|  Postfach 10.2209, 6900 Heidelberg, Germany.                               |
|                                             Tel: +49 6221 387258           |
|  E-mail: NetHelp@EMBL-Heidelberg.DE         Fax: +49 6221 387519           |
------------------------------------------------------------------------------


Contents:

 <1> Introduction
 <2> Improvement of EMBL's Internet connectivity
 <3> Anonymous FTP services - alternative sites
 <4> New mail server command
 <5> Updates to data collections
 <6> Updates to software collection
 <7> Other updates
 <8> Summary of directories on the file server
 <9> Getting started ?
 <10> Network addresses at EMBL


<1> Introduction
    ------------

    The EMBL File Server is a facility available on the EMBL computing system
    for external users to request files by electronic mail, anonymous FTP or
    Gopher. The service is free.


<2> Improvement of EMBL's Internet connectivity
    -------------------------------------------

    Recently there has been a improvement in EMBL's Internet connection which
    results in considerably faster access to our anonymous FTP server and to
    our Gopher server, and also a quicker response to e-mail file server
    requests.

    Please note that the FTP and Gopher servers both run on 
                   FTP.EMBL-Heidelberg.DE  (192.54.41.33)
    Do not use the host name EMBL-Heidelberg.DE

<3> Anonymous FTP services - alternative sites

    (a) The anonymous FTP archives at EMBL are mirrored by an FTP server
        managed by the Israeli EMBnet national node (INN) at the Weizmann
        Institute.
        Internet address: sunbcd.weizmann.ac.il

    (b) Complete copies of the EMBL quarterly releases are also available
        by anonymous FTP from:
        - Swiss EMBnet Node, Biozentrum der Universitaet Basel (Switzerland) 
          Internet address: bioftp.unibas.ch [131.152.8.1]
          Maintained by Reinhard Doelz  (doelz@comp.bioz.unibas.ch)
          Plain ASCII flat files. See the file DESCRIPTION in the top level
          directory for more information, also on other formats of data.

        - Department of Molecular Biology Massachusetts General Hospital
          Internet address: amber.mgh.harvard.edu [132.183.190.26]
          Maintained by Mike Cherry (CHERRY@Frodo.MGH.Harvard.EDU)
          Compressed EMBL flat files. See the 000readme.txt file in the EMBL
          directory for more information, also on other formats of data. 


<4> New mail server command
    -----------------------

    The SIZE command was added to the set of commands recognised by the EMBL
    e-mail server. Because some mailer systems have a maximum file size
    limitation the EMBL mail server splits large files into parts. The default
    size of these packets is 95K but can be changed with the SIZE command.
    E.g. SIZE 30 would change the packet size to 30K, whereas SIZE 500 would
    set it to 500K. Note, however, that uuencoded files are stored in
    individual parts of 90K each on our server, so changing the packet size to
    larger values will have no effect on them.


<5> Updates to Data Collections
    ------------------------------------

    New databases have been added to the file server recently:

    (a) Steven Henikoff's BLOCKS database
        Henikoff, S. and Henikoff, J. G. (1991) Automated assembly of protein
        blocks for database searching. Nucleic Acids Res. 19, 6565-6572.
        E-mail server: directory BLOCKS
        Anonymous ftp: /pub/databases/blocks

    (b) A database of CpG islands in the human genome (CPGISLE)
        Larsen, F., Gundersen, G., Lopez, L. and Prydz, H. (1992) CpG island as
        Gene Markers in the Human Genome. Genomics 13, 1095-1107.
        E-mail server: directory CPGISLE
        Anonymous ftp: /pub/databases/cpgisle

    (c) A database of protein kinase catalytic domains (PKCDD)
        provided by S.K. Hanks, A.M. Quinn and T. Hunter, Salk Institute.
        E-mail server: directory PKCDD
        Anonymous ftp: /pub/databases/pkcdd

    (d) Pre-release data from the Brookhaven Protein Data Bank (PDB) are now
        available in addition to the full releases.
        E-mail server: directory PROTEINDATA


<6> Updates to Software Collection
    ------------------------------

    Here is a list of new (N) molecular biological programs or updates (U):
    The full path specifications for these files on the EMBL ftp server are
    shown in square brackets.

    DOS:
    ----

    AUTHORIN.UAA        (N) Sequence data submission tool
                            (Intelligenetics/DDBJ/EMBL/GenBank)
                            [/pub/software/dos/authorin.uaa to authorin.uaf]

    CODONS.UUE          (N) Codon usage analysis (A. Lloyd and P. Sharp)
                            [/pub/software/dos/codons.uue]

    CREGEX.C            (U) Conversion of PROSITE to Prosearch format v1.2
                            (J. Leunissen)
                            [/pub/software/dos/cregex.c]

    DOTPLOT.UUE         (N) Dot plot analysis (R> Nakisa)
                            [/pub/software/dos/dotplot.uue]

    ESEE.UAA            (U) Multiple sequence alignment editor v1.09e
                            (E. Cabot)
                            [/pub/software/dos/esee.uaa to esee.uac]

    FASTMAP.UAA         (N) Approx. multipoint lod score calculation
                            (D. Curtis)
                            [/pub/software/dos/fastmap.uaa to fastmap.uac]

    GEPASI.UAA          (N) Modelling of metabolic pathways (P. Mendes)
                            [/pub/software/dos/gepasi.uaa to gepasi.uai]

    MACAW105.UAA        (U) Multiple sequence editor v1.05 (G. Schuler)
                            [/pub/software/dos/macaw105.uaa and macaw105.uab]

    PEDRAW14.UAA        (U) Pedigree drawing program v1.4 (D. Curtis)
                            [/pub/software/dos/pedraw14.uaa to pedraw14.uae]

    RAMHA.UAA           (N) Monte Carlo simulation of random mutagenesis
                            synthetic cDNA (D. Siderovski)
                            [/pub/software/dos/ramha.uaa and ramha.uab]

    SAR2PCIT.UUE        (N) Conversion of SeqAnalRef to ProCite format
                            (E. Sonnhammer)
                            [/pub/software/dos/sar2pcit.uue]

    SORFIND.UAA         (N) Prediction of exons in vertebrate genomic DNA
                            (G. Hutchinson)
                            [/pub/software/dos/sorfind.uaa and sorfind.uab]

    TRBBS.UAA           (N) File exchange program for automated fluorescent DNA
                            sequencer data (I. Consani)
                            [/pub/software/dos/trbbs.uaa and trbbs.uab]
                            
    Mac:
    ----

    AUTHORIN.HQX        (N) Sequence data submission tool
                            (Intelligenetics/DDBJ/EMBL/GenBank)
                            [/pub/software/mac/authorin.hqx]

    DATAMINDER.HQX      (N) Data management tools for molecular biologists
                            (K. Usdin)
                            [/pub/software/mac/dataminder.hqx]

    EMBL-SEARCH.HQX     (U) Database retrieval software for EMBL CD-ROM v2.1.1
                            (EMBL Data Library)
                            [/pub/software/mac/embl-search.hqx]

    EMBL-SEARCH_SRC.HQX (N) Source code for EMBL-Search v2.1.1
                            [/pub/software/mac/embl-search_src.hqx]

    GBSEARCH-NCBI.HQX   (U) Tool to assist access to GenBank servers at NCBI
                            v2.0.2 (D. Gilbert)
                            [/pub/software/mac/gbsearch-ncbi.hqx]

    GELREADER_FPU.HQX   (N) NCSA's GelReader software for Macs with FPU
                            [/pub/software/mac/gelreader_fpu.hqx]

    GELREADER_NO_FPU.HQX (N) NCSA's GelReader software for Macs w/o FPU
                            [/pub/software/mac/gelreader_no_fpu.hqx]
                            
    GELREADER_SAMPLES.HQX (N) Example files for NCSA's GelReader
                            [/pub/software/mac/gelreader_samples.hqx]

    HYPERPCR.HQX        (N) Calculation of PCR conditions (B. Osborne)
                            [/pub/software/mac/hyperpcr.hqx]

    LOOPDLOOP.HQX       (N) Tool for drawing RNA structures (D. Gilbert)
                            [/pub/software/mac/loopdloop.hqx]
                            
    MACPATTERN.HQX      (U) Protein pattern searching with PROSITE and
                            BLOCKS database v.2.0.1 (R. Fuchs)
                            [/pub/software/mac/macpattern.hqx]

    MACT_GENERAL.HQX    (N) MacT package for phylogenetic tree calculation
                            (general programs and documentation) (Luettke)
                            [/pub/software/mac/mact_general.hqx]

    MACT_TREE26.HQX     (N) MacT package for phylogenetic tree calculation
                            (TREE26 programs for up to 26 sequences) (Luettke)
                            [/pub/software/mac/mact_tree26.hqx]

    MACT_TREE4.HQX      (N) MacT package for phylogenetic tree calculation
                            (TREE4 programs for four sequences) (Luettke)
                            [/pub/software/mac/mact_tree4.hqx]

    MACT_TREE5.HQX      (N) MacT package for phylogenetic tree calculation
                            (TREE5 programs for five sequences) (Luettke)
                            [/pub/software/mac/mact_tree5.hqx]

    PUPKIT.HQX          (N) TrueType and Postscript fonts for displaying
                            sequences in Puppy and Kitty representation
                            (U. Melcher)
                            [/pub/software/mac/pupkit.hqx]

    PUPPY.HQX           (U) Special display of nucleic acid and protein
                            sequences v2.0 (U. Melcher)
                            [/pub/software/mac/puppy.hqx]
                        
    STUFFITLITE.HQX     (U) Compression/decompression/binhex program v3.0.3
                            (R. Lau)                            
                            [/pub/software/mac/stuffitlite.hqx or
                             stuffitlite.sea]

    YEASTSTRAINS.HQX    (U) Strain management, in particular yeast
                            (K. Froehlich)
                            [/pub/software/mac/yeaststrains.hqx]

    UNIX:
    -----

    CREGEX.C            (U) Conversion of PROSITE to Prosearch format v1.2
                            (J. Leunissen)
                            [/pub/software/unix/cregex.c]

    ICATOOLS.UAA        (N) Clustering and statistical analysis of large
                            cDNA collections (J. Parsons)
                            [/pub/software/unix/icatools.tar.Z]

    ICRF_CTG.UAA        (N) Tools for ordering clone libraries based on
                            hybridisation data (R. Mott and A. Grigoriev)
                            [/pub/software/unix/icrf_ctg.tar.Z]

    ISSC.UAA            (U) Sensitive sequence alignment package (Oct 92)
                            (P. Argos et al.)
                            [/pub/software/unix/issc.tar.Z]

    MAILFASTA.UUE       (U) Script for using EMBL/GenBank Mail-FASTA servers
                            v3.0 (T. deBoer)
                            [/pub/software/unix/mailfasta.tar.Z]

    OVERSEER.UAA        (U) Package for searching nucleic acid databases
                            (Oct 92) (P. Sibbald)
                            [/pub/software/unix/overseer.tar.Z]

    STATUS.UAA          (N) Tools for managing large DNA-sequencing projects
                            (M. Dubnik)
                            [/pub/software/unix/status.tar.Z]

    ProtQuiz            (N) Xwindows protein 3D/1D display 
    (only available         (M.Scharf, C.Sander)
     from FTP server)       [/pub/software/unix/protquiz/ProtQuiz-0.9.tar.Z]


    VAX:
    ----

    CDACCESS.UAA        (U) Driver software for reading ISO CD-ROMs v2.05
                            (P. Stockwell)
                            [/pub/software/vax/cdaccess.uaa and cdaccess.uab]

    CREGEX.C            (U) Conversion of PROSITE to Prosearch format v1.2
                            (J. Leunissen)
                            [/pub/software/vax/cregex.c]

    GENEIDSHELLS.SHARE  (N) DCL shells for using GENEID server (F. Macrides)
                            [/pub/software/vax/geneidshells.share]

    GRAILSHELLS.SHARE   (N) DCL shells for using GRAIL server (F. Macrides)
                            [/pub/software/vax/grailshells.share]

    ICATOOLS.UAA        (N) Clustering and statistical analysis of large
                            cDNA collections (J. Parsons)
                            [/pub/software/vax/icatools.uaa to icatools.uai]

    ISSC.UAA            (U) Sensitive sequence alignment package (Oct 92)
                            (P. Argos et al.)
                            [/pub/software/unix/issc.uaa to issc.uak]

    NCBISHELLS.SHARE    (N) DCL shells for using GenBank servers (F. Macrides)
                            [/pub/software/vax/ncbishells.share]

    OVERSEER.UAA        (U) Package for searching nucleic acid databases
                            (Oct 92) (P. Sibbald)
                            [/pub/software/unix/overseer.uue]

    SCRUTINE.UAA        (U) Scrutineer, sequence database analysis, Nov 1992
                            (P. Sibbald)
                            [/pub/softare/vax/scrutine.uaa to scrutine.uai]


<7> Other updates
    -------------

    (a) A new directory that will hold information for crystallographers,
        XRAY. The only file currently present is the list of e-mail addresses
        of crystallographers and related scientists maintained by M. Teeter,
        Boston College.
        E-mail server: directory XRAY
        Anonymous ftp: /pub/databases/xray

    (b) ALIGN directory:

        DS11144.DAT            - Alignment of insect mtDNA and ND1 gene
                                 products. Submitted by D. Pashley, 12-Jun-1992

        DS12100.DAT            - Alignment of small subunit rRNAs from
                                 higher fungi. Submitted by J. Suguyama,
                                 4-Sep-1992



<8> Summary of directories on the file server
    ---------------------------------------

    directories with updated information are marked by an asterisk.

                                           Anonymous ftp          NetServ
                                          --------------         ---------
*   EMBL Nucleotide Sequence Database    /pub/databases/embl       NUC
      (Rel. 33, Dec 92 + updates)
*   Eukaryotic Promotor Database         /pub/databases/epd        EPD
      (Rel. 33, Nov 92)
*   SwissProt Protein Sequence Database  /pub/databases/swissprot  PROT
       (Rel. 23, Aug 92 + updates)
*   Prosite pattern database             /pub/databases/prosite    PROSITE
       (Rel. 9.10, Aug 92)
*   ENZYME database                      /pub/databases/enzyme     ENZYME
       (Rel. 10.00, Aug 92)
*   Brookhaven Protein Databank          not available             PROTEINDATA
       (Rel. 61, Jul 92 + pre-release)
*   REBASE, Restriction Enzyme Database  /pub/databases/rebase     REBASE
       (Rel. 9212, Dec 92)
    tRNA sequence and gene sequence db   /pub/databases/trna       TRNA
       (1991)
*   TFD, Transcription Factor Database   /pub/databases/tfd        TFD
       (Ver 5.5, Nov 92)
*   ECD, E.coli Database                 /pub/databases/ecd        ECD
       (Rel. 13, Nov 92)
*   FLYBASE, Drosophila Genetic Map db   /pub/databases/flybase    FLYBASE
       (9209, 8-Sep-1992)
*   LiMB, Listing of Mol. Biol. db's     /pub/databases/limb       LIMB
       (Rel. 3.0)
*   SEQANALREF, Seq. analysis refs       /pub/databases/reflist    REFLIST
       (Rel. 32, Oct 92)
    FANS_REF, Functional analysis refs   /pub/databases/reflist    REFLIST
      (Rel. 3.4, Apr 91)
    Alu sequence database and alignment  /pub/databases/alu        ALU
*   Haemophilia B database               /pub/databases/haemb      HAEMB
      (Rel. 2, Dec 1992)
    Compilation of small RNA sequences   /pub/databases/smallrna   SMALLRNA
      (Oct 91)
    Berlin Databank of 5S rRNA and       /pub/databases/berlin     BERLIN
      5S rRNA gene sequences (1991)
    Compilation of small ribosomal       /pub/databases/rrna       RRNA
      subunit RNA sequences (May 1992)
    CUTG, codon usage                    /pub/databases/cutg       CUTG
      tabulated from GenBank rel. 69
    3D_Ali, 3D alignment database        /pub/databases/3d_ali     3D_ALI
      (March 1992)
    RLDB, Reference Library Database     /pub/databases/rldb       RLDB
      (April 1992)
*   CpG Islands Database                 /pub/databases/cpgisle    CPGISLE
      (Pre-release 1.0, Oct 92)
*   Blocks database                      /pub/databases/blocks     BLOCKS
      (Rel. 5.0, Jun 92)
*   HSSP, sequence-aligned protein       /pub/databases/protein_extras/hssp
          families                                                 PROTEINDATA
*   FSSP, structure-aligned protein      /pub/databases/protein_extras/fssp
          families (ftp only)                                                 
*   DSSP, protein secondary structures   /pub/databases/protein_extras/dssp
                                                                   PROTEINDATA
*   pdb_select, representative sets of   /pub/databases/protein_extras/
                3D proteins (ftp only)                              pdb_select


    Software:

    Software for MS-DOS computers        /pub/software/dos         DOS_SOFTWARE
    Software for Apple Macintosh         /pub/software/mac         MAC_SOFTWARE
    Software for UNIX                    /pub/software/unix        UNIX_SOFTWARE
    Software for VAX/VMS                 /pub/software/vax         VAX_SOFTWARE
    Other software                       /pub/software/misc        MISC_SOFTWARE

    Miscellaneous:

    Technical documents, submission and  /pub/doc                  DOC
      order forms, etc.
    Multiple DNA sequence alignments     /pub/databases/embl/align ALIGN
      and consensus sequences
    Codon Usage tables                   /pub/databases/codonusage CODONUSAGE
*   Crystallographer's information       /pub/databases/xray       XRAY


<9> Getting Started ?
    -----------------
    
    For initial information, send standard electronic mail to the address:
      NetServ@EMBL-Heidelberg.DE
    containing just the word HELP on a line by itself.

    To use the anonymous ftp server, connect to the internet address
      FTP.EMBL-Heidelberg.DE
    using the username "anonymous" (without the quotes !) and giving your
    e-mail address as the password. Look in the directory /pub/help for
    various help files.

    To use the Gopher server, open a connection to FTP.EMBL-Heidelberg.DE
    at the standard Gopher port 70.


<10> Network addresses at EMBL
    -------------------------

    EMBL File Server (e-mail requests)        NetServ@EMBL-Heidelberg.DE
    FASTA e-mail server                       FASTA@EMBL-Heidelberg.DE
    Quicksearch e-mail server                 Quick@EMBL-Heidelberg.DE
    Anonymous FTP                             FTP.EMBL-Heidelberg.DE

    Problems, feedback (human contact)        NetHelp@EMBL-Heidelberg.DE
    EMBL Data Library enquiries               DataLib@EMBL-Heidelberg.DE
    EMBL Data Library sequence submissions    DataSubs@EMBL-Heidelberg.DE
    Software submissions and problems         Software@EMBL-Heidelberg.DE

From owner-embldatabank@net.bio.net Fri Dec 04 22:00:00 1992
Path: biosci!uwm.edu!zaphod.mps.ohio-state.edu!wupost!uunet!mcsun!sunic!bio.embnet.se!embnet.se!daresbury!bioftp.unibas.ch!comp.bioz.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: Alternative database update and syncing
Message-ID: <1992Dec5.160851.20187@comp.bioz.unibas.ch>
Date: 5 Dec 92 16:08:51 GMT
Sender: usenet@comp.bioz.unibas.ch (NEWS transaction account)
Reply-To: doelz@urz.unibas.ch
Organization: EMBnet Switzerland [BASEL]
Lines: 49
Xref: biosci bionet.molbio.genbank:1100 bionet.molbio.embldatabank:123
Nntp-Posting-Host: biox.embnet.unibas.ch

In the past, there was much discussion going on with respect to the 
desired mechanism of feeding updates per news and polling them via 
ftp. Both mechanisms lack the feature of synchronisation, i.e. the only 
way to be sure that you have all the database entries is to ftp the whole 
updates. Networking in the previuos years might have worked this way. 
However, if hundreds of researchers do this, we run into massive bandwith 
problems. 

I have developed a different scheme witch runs on top of the HASSLE 
protocol (Hierarchical Access System for Sequence Libraries in Europe). 
The documentation is not ready yet but the client software will be 
available, as well as the database tools. Usage is curretly only 
anticipated for the EMBL database. 

The idea is that you send a datafile to a server containing all the 
entries as a listing (for simplicity, I sticked to Peter Stoehr's 
internal routines which list entry name, accession number, version 
number, and date of entry) and the server compares this list with its 
own database. Then, a file is created which contains all the entries 
the client misses, and is sent back to the client. If there's a need 
the entries can be decomposed into single files at the client's end. 

The data transmission runs asyncronously on a special socket (port 375), 
and uses on-the-fly compression and encryption. Accounting and security 
are standard. 

Alternative mechanisms include a request to send all new entries since 
a given date, and the full daily update release. 

The software currently runs on VAX/VMS with UCX,TCPWARE or MULTINET, 
and IRIX,SunOS and ULTRIX. It was developed on a Silicon Graphics 
running the CodeVision environment. Thanks go to Lukas Rosenthaler 
for contributing the compression code, and to Basel University and the 
Swiss National Science Foundation for grant support. 

Mail me if you are interested. 

Regards 
Reinhard 

-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz@urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------

From owner-embldatabank@net.bio.net Thu Dec 10 22:00:00 1992
Path: biosci!agate!usenet.ins.cwru.edu!magnus.acs.ohio-state.edu!zaphod.mps.ohio-state.edu!sdd.hp.com!caen!batcomputer!cornell!uw-beaver!news.u.washington.edu!serval!cosy.chem.wsu.edu!alam
From: alam@cosy.chem.wsu.edu (Steve Alam)
Newsgroups: bionet.molbio.embldatabank
Subject: Complete sequence for Pet3D
Message-ID: <1992Dec10.190516.4827@serval.net.wsu.edu>
Date: 10 Dec 92 19:05:16 GMT
Sender: news@serval.net.wsu.edu (USENET News System)
Organization: Washington State University
Lines: 10

Keywords: We're lost

Help!
We are running into trouble in trying to decifer the DNA sequence of Pet3D expression 
vector near the areas where the multiple cloning site was placced into Pbr322.  Does
anybody have the complete sequence of the regions upstream of the T7 promoter and downstream 
of the terminator??  Or can anybody give us an idea on how to obtain it?

Thanx a bunch...
From the NW

From owner-embldatabank@net.bio.net Thu Dec 17 22:00:00 1992
Path: biosci!daresbury!buzz.bmc.uu.se!embl-heidelberg.de!stoehr
From: stoehr@embl-heidelberg.de
Newsgroups: bionet.molbio.embldatabank
Subject: EMBL CD-ROM Dec. 92 is ready
Message-ID: <1992Dec18.153856.60125@embl-heidelberg.de>
Date: 18 Dec 92 14:38:56 GMT
Organization: EMBL, European Molecular Biology Laboratory
Lines: 31

The December 1992 issue of the EMBL CD-ROM (2 disks) is now being distributed,
as well as release 33 of the EMBL Nucleotide Sequence Database on tape. We
anticipate that all orders are shipped before Christmas.
The CD-ROM contains several databases as follows:

EMBL Nucleotide Sequence Database (EMBL Data Library)
Swiss-Prot Protein Sequence Database (Amos Bairoch & EMBL Data Library)
+
Berlin RNA database (T. Specht et al)
BLOCKS database (Steven Henikoff)
EPD, Eukaryotic Promoter Database (Philip Bucher)
ECD, E. coli database (Manfred Kroeger)
CpG Island database (F. Larsen et al)
CUTG, codon usage tables (K. Wada et al)
ENZYME database (Amos Bairoch)
HSSP, structure-sequence alignments (Reinhard Schneider, Chris Sander)
KABAT dictionary of sequences of immunological interest (Elvin Kabat)
PROSITE pattern database (Amos Bairoch)
REBASE, Restriction enzyme database (Rich Roberts)
FlyBase, Drosophila genetic map database (Michael Ashburner)
HaemB, Haemophilia B database (George Brownlee)
RLDB, Reference Library Database (Georg Zehetner)
PKCDD, Protein Kinase Catalytic Domain database (Steven Hanks et al)
rRNA, ribosomal RNA database (Rupert De Wachter et al)
SeqAnalRef, sequence analysis bibliography (Amos Bairoch)
TFD, Transcription Factor database (David Ghosh)
tRNA, transfer RNA sequence database (M. Sprinzl et al)
SmallRNA, database of small RNA sequences (S. Gupta, R. Reddy)

Peter Stoehr
EMBL Data Library

From owner-embldatabank@net.bio.net Thu Dec 17 22:00:00 1992
Path: biosci!daresbury!buzz.bmc.uu.se!embl-heidelberg.de!stoehr
From: stoehr@embl-heidelberg.de
Newsgroups: bionet.molbio.embldatabank
Subject: EMBL rel. 33 release notes (extracts)
Message-ID: <1992Dec18.155049.60126@embl-heidelberg.de>
Date: 18 Dec 92 14:50:49 GMT
Organization: EMBL, European Molecular Biology Laboratory
Lines: 97

The following are some extracts from the release notes of release 33 of the
EMBL Nucleotide Sequence Database, December 1992. Comments are welcome, either
via the BIOSCI newsgroup bionet.molbio.embldatabank, or by e-mail to
datalib@embl-heidelberg.de
--------------------------

A breakdown of Release  33 by taxonomic division is shown below:

                  Division             Entries    Nucleotides
                  -----------------    -------    -----------
                  Bacteriophage            825        1077956
                  Fungi                   3675        6948386
                  Invertebrates           9014       11077413
                  Organelles              2925        4272437
                  Other Mammals           3075        4049195
                  Other Vertebrates       3859        4652366
                  Plants                  5369        6681381
                  Primates               22165       21151438
                  Prokaryotes            10926       18322905
                  Rodents                14189       15963099
                  Synthetic               1309        1096823
                  Unclassified            2658        2393182
                  Viruses                 9111       13727398
                  -----------------    -------    -----------
                  Total                  89100      111413979

This represents an increase of about 12% from release 32.


2  FORTHCOMING CHANGES

2.1  RA Line Author Name Format

As from Release 34 in March 1993 we will change the format of author names on RA
lines  to conform to that used by major bibliographic databases such as Medline.
The main change is that the periods which currently appear within initials  will
not appear any more.

For example, the current:

     RA   Wilson A., Smith B.G.;

will then appear as:

     RA   Wilson A, Smith BG;


2.2  Patent Sequences

A collaboration between the EMBL Data Library and  the  European  Patent  Office
will  enable  us  to  include patent sequences in our quarterly distributions as
from Release 34 in March 1993.  These sequences will  each  have  a  new  patent
reference type, to document the source of the data.  Patent-specific information
will appear in the RL line block (introduced by the keyword  "Patent")  and  the
other reference linetypes (RN, RP, RC, RA, RT) will appear as usual.  An example
of a patent RL line block is shown below:

     RL   Patent number EP0062971-A/1, 20-OCT-1982.
     RL   IMPERIAL CHEMICAL INDUSTRIES PLC.
     RL   UNIVERSITY OF LEICESTER.

The date on the first line in the RL  block  is  the  publication  date  of  the
patent, and the following line(s) list the patent applicant(s).


2.3  Molecule Topology

As from Release 35 in June 1993 we will indicate molecules which are known to be
circular  by  prefixing  the  molecule  type  on  the  ID  line with the keyword
"circular".  Other topology keywords may appear in the same location  at  future
releases.   Please  note that the absence of the keyword "circular" should *not*
be taken to indicate that the molecule is definitely known not to be circular.

An example of such a circular molecule's ID line is shown below:

     ID   CLSPC1     standard; circular DNA; ROD; 346 BP.


2.4  Sequence Numbering

To aid reading the sequence bases in  database  entries,  we  will  insert  base
numbers  in columns 73-80 of each sequence line as from Release 35 in June 1993.
The numbers will be right justified, and will indicate the number  of  the  last
base on each line.  An example is shown below (the ruler is for your convenience
and will not appear in the database entries):

1       10        20        30        40        50        60        70        80
+--------+---------+---------+---------+---------+---------+---------+---------+
SQ   Sequence  245 BP; 60 A; 44 C; 77 G; 64 T; 0 other;
     agatcttctg ctcccaggag agagagcaat gtctagagta gggaaaagga ccatcttagc        60
     cctctactat aggcagctgt ctgctacccg tcactcacca atgggagagg aggcatgggt       120
     attgtgttca gatggggccc agtgttattt atttgagact ggatcagggt gagaacttga       180
     ggggaagggt tggagtagaa ggttatgatc tttctagaca gtgctgcatt ggtggcttga       240
     ctgac                                                                   245
//
+--------+---------+---------+---------+---------+---------+---------+---------+
1       10        20        30        40        50        60        70        80

From owner-embldatabank@net.bio.net Thu Dec 17 22:00:00 1992
Path: biosci!uwm.edu!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!agate!doc.ic.ac.uk!daresbury!buzz.bmc.uu.se!embl-heidelberg.de!stoehr
From: stoehr@embl-heidelberg.de
Newsgroups: bionet.molbio.embldatabank
Subject: Re: EMBL CD-ROM Dec. 92 is ready
Message-ID: <1992Dec18.170620.60127@embl-heidelberg.de>
Date: 18 Dec 92 16:06:20 GMT
References: <1992Dec18.153856.60125@embl-heidelberg.de>
Organization: EMBL, European Molecular Biology Laboratory
Lines: 9

In article <1992Dec18.153856.60125@embl-heidelberg.de>,
stoehr@embl-heidelberg.de writes:

> RLDB, Reference Library Database (Georg Zehetner)

Sorry, that should be Guenther Zehetner.

Peter Stoehr
EMBL Data Library

From owner-embldatabank@net.bio.net Tue Dec 22 22:00:00 1992
Path: biosci!daresbury!bioftp.unibas.ch!comp.bioz.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: Re: EMBL <> GenBank
Message-ID: <1992Dec23.081643.11004@comp.bioz.unibas.ch>
Date: 23 Dec 92 08:16:43 GMT
References: <1992Dec22.142144.18782@husc3.harvard.edu> <1992Dec22.220954.563@nlm.nih.gov>
Sender: usenet@comp.bioz.unibas.ch (NEWS transaction account)
Reply-To: doelz@urz.unibas.ch
Organization: EMBnet Switzerland [BASEL]
Lines: 76
Xref: biosci bionet.molbio.genbank:1133 bionet.molbio.embldatabank:128
Nntp-Posting-Host: biox.embnet.unibas.ch

In article <1992Dec22.220954.563@nlm.nih.gov>, ostell@object.nlm.nih.gov (Jim Ostell) writes:
|> Actually there are a large number of differences between EMBL and
|> GenBank, including entries with the same primary accession but
|> different sequences in EMBL and GenBank, entries with the same
|> sequence but different primary accessions in EMBL and GenBank,
|> and all sorts of other varients.  There are a variety of reasons
...

It has been brought to our attention by one of our customers that there 
are currently thousands (!) of these cases. In contrast to the original 
assumption that Genbank 74 will now be sort of identical to EMBL 33, I 
can only warn all of you who trusted in this rather than trying it out. 

Actually, if you take one of the applications doing accession number 
exclusion (I use the GCG package for this purpose here), I get more 
than 10000 entries which show accession numbers that are not in EMBL 33
but in GENBANK 74 ... the listing keyed by division is shown here: 

Gb_Ba:
     Entries:   1,715   Accession Numbers:  2,692
Gb_In:
     Entries:   1,037   Accession Numbers:  1,494
Gb_Om:
     Entries:     446   Accession Numbers:    634
Gb_EST:
     Entries:   1,000   Accession Numbers:  1,000
Gb_Ov:
     Entries:     487   Accession Numbers:    694
Gb_Ph:
     Entries:      76   Accession Numbers:    134
Gb_Pl:
     Entries:   1,707   Accession Numbers:  2,512
Gb_Pr:
     Entries:   3,096   Accession Numbers:  3,743
Gb_Ro:
     Entries:   1,933   Accession Numbers:  2,601
Gb_St:
     Entries:      61   Accession Numbers:     63
Gb_Sy:
     Entries:       5   Accession Numbers:      5
Gb_Un:
     Entries:       8   Accession Numbers:      8
Gb_Vi:
     Entries:     747   Accession Numbers:    938

The statistics read as follows: 

Numbers       	total   Primary numbers
D...		 1093 	1027
J... 		 15 	3
K... 		 4 	2
L... 		 219 	219
M... 		 966 	107
S... 		 2900   2899
V... 		 3      0
X... 		 468    6
Z... 		 30     6

Total            5698   4269

The only thing which is not clear to me is, then, why these 5700 numbers
are present in more than 15000 Genbank entries. Clearly a point where 
a cleanup might be necessary... 

Regards
Reinhard

-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz@urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------

From owner-embldatabank@net.bio.net Tue Dec 22 22:00:00 1992
Path: biosci!daresbury!bioftp.unibas.ch!embl-heidelberg.de!stoehr
From: stoehr@embl-heidelberg.de
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: Re: EMBL <> GenBank
Message-ID: <1992Dec23.160654.60156@embl-heidelberg.de>
Date: 23 Dec 92 15:06:54 GMT
References: <1992Dec22.142144.18782@husc3.harvard.edu> <1992Dec22.220954.563@nlm.nih.gov> <1992Dec23.081643.11004@comp.bioz.unibas.ch>
Organization: EMBL, European Molecular Biology Laboratory
Lines: 23
Xref: biosci bionet.molbio.genbank:1134 bionet.molbio.embldatabank:129

In article <1992Dec23.081643.11004@comp.bioz.unibas.ch>,
doelz@comp.bioz.unibas.ch (Reinhard Doelz) writes:
> 
> It has been brought to our attention by one of our customers that there 
> are currently thousands (!) of these cases. In contrast to the original 
> assumption that Genbank 74 will now be sort of identical to EMBL 33, I 
> can only warn all of you who trusted in this rather than trying it out. 
> 

Firstly, the assumption that GenBank 74 should be 'sort of identical' to
EMBL 33 is false (depending on how identical 'sort of' means). The two
databases are made at different times.
We do not yet have Genbank 74 here: when we do, as for other quarterly
releases, we determine what we are missing and work to include it all in the
EMBL database. I'm surprised at the figure of 10,000, as I believe that when
we made EMBL 33 that all acc#'s from the previous GenBank release were in.
Another current difference is the accession numbers beginning with 'S' (about
3000 according to your figures) which we do not include in EMBL yet - but
that's a different story.

Regards,
Peter Stoehr
EMBL Data Library

From owner-embldatabank@net.bio.net Tue Dec 22 22:00:00 1992
Path: biosci!daresbury!bioftp.unibas.ch!comp.bioz.unibas.ch!rdoelz
From: rdoelz@comp.bioz.unibas.ch [remote] (Reinhard Doelz)
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: Re: EMBL <> GenBank
Message-ID: <1992Dec23.193147.21458@comp.bioz.unibas.ch>
Date: 23 Dec 92 19:31:47 GMT
References: <1992Dec22.142144.18782@husc3.harvard.edu> <1992Dec22.220954.563@nlm.nih.gov> <1992Dec23.081643.11004@comp.bioz.unibas.ch> <1992Dec23.160654.60156@embl-heidelberg.de>
Sender: usenet@comp.bioz.unibas.ch (NEWS transaction account)
Reply-To: doelz@urz.unibas.ch
Organization: EMBnet Switzerland [BASEL]
Lines: 63
Xref: biosci bionet.molbio.genbank:1135 bionet.molbio.embldatabank:130
Nntp-Posting-Host: biox.embnet.unibas.ch

In article <1992Dec23.160654.60156@embl-heidelberg.de>, stoehr@embl-heidelberg.de writes:
|> In article <1992Dec23.081643.11004@comp.bioz.unibas.ch>,
|> doelz@comp.bioz.unibas.ch (Reinhard Doelz) writes:
|> > 
|> > It has been brought to our attention by one of our customers that there 
|> > are currently thousands (!) of these cases. In contrast to the original 
|> > assumption that Genbank 74 will now be sort of identical to EMBL 33, I 
|> > can only warn all of you who trusted in this rather than trying it out. 
|> > 
|> 
|> Firstly, the assumption that GenBank 74 should be 'sort of identical' to
|> EMBL 33 is false (depending on how identical 'sort of' means). The two
|> databases are made at different times.

Sorry if this sounds misleading  - I trusted in the release notes stating 
"New and updated sequence data from the latest GenBank and  DDBJ  releases
have been  incorporated  into  this release." Freeze date EMBL 
Release 33 on 8 November 1992, GENBANK 74 on 17 November 1992 - 
I thought that this would imply the most recent daily updates from 
GENBANK also to be in EMBL and vice versa. 

|> We do not yet have Genbank 74 here: when we do, as for other quarterly
|> releases, we determine what we are missing and work to include it all in the
|> EMBL database. I'm surprised at the figure of 10,000, as I believe that when
|> we made EMBL 33 that all acc#'s from the previous GenBank release were in.
|> Another current difference is the accession numbers beginning with 'S' (about
|> 3000 according to your figures) which we do not include in EMBL yet - but
|> that's a different story.

I used the -exclude flag in the GCG software to compute the numbers that I 
quoted. The entry number is surprising to me also, and I just hope that 
there is an error in my procedures. 

With respect to the numbers; I have taken the daily EMBL updates (as of Dec 20
at EMBL creation), and the GENBANK 74 release files (not their daily updates), 
and came up with 8141 entries of EMBL, resembling 8540 accession numbers. 
(This is for both new entries from EMBL and GENBANK as sent by EMBL in 
EMBL format. If I look at how many entries are in GENBANK 74 not in 
EMBL 33 (containing 5698 AN), and subtract those which are unique to EMBL, it 
appears that EMBL updates contain 2591 entries which are in GENBANK 74 
(according to 2612 Accession Numbers). That still leaves us with a 
considerable number of discrepancies. 
|> 
|> Regards,
|> Peter Stoehr
|> EMBL Data Library

Your work is most appreciated, and I encourage you to keep up the good 
work. I didn't want to make any negative statements on quality in general, 
just point out that the exclusion sets are still a need. 

Regards 
Reinhard 

-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz@urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------

