From owner-embldatabank@net.bio.net Thu Jul 01 23:00:00 1993
Path: biosci!daresbury!bioftp.unibas.ch!comp.bioz.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: secondary accession numbers
Message-ID: <1993Jul2.055206.7673@comp.bioz.unibas.ch>
Date: 2 Jul 93 05:52:06 GMT
References: <1993Jul1.234108.1@molbiol.ox.ac.uk>
Sender: usenet@comp.bioz.unibas.ch (NEWS transaction account)
Reply-To: doelz@urz.unibas.ch
Followup-To: bionet.software.gcg
Organization: EMBnet Switzerland [BASEL]
Lines: 65
Nntp-Posting-Host: biox.embnet.unibas.ch

In article <1993Jul1.234108.1@molbiol.ox.ac.uk>, rhubner@molbiol.ox.ac.uk writes:
...
|> may represent *secondary* accession numbers and can not be searched for by
|> FETCH (and especially STRINGSEARCH; only ref fields) in GCG. This was the
                                            ^^^
                                            DE

The problem which is described by you applies only to those secondary 
accession numbers which occur more than once. 

The accession numbers are used as identifiers in GCG. Look at the .numbers
file and see

biox > more /bioy/data/xembl/xembl.numbers
D00410    10824 S
D00515     9593 S
D00683     3219 S
D00684     3219 S
D00739    11834 S
D00819    10077 P
D00821    11584 S
D00844     3136 P
D00849     3049 S
D01022     6221 P
...

'S' for secondary and 'P' for primary. 

If the secondary accession number occurs only *once* then GCG will work 
due the fact that it is unique.

In your case, 
/bioy/data/gcgembl/em_pr.ref:AC   M18693; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18694; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18695; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18696; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18697; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18698; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18699; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18700; J03516; M18691;
/bioy/data/gcgembl/em_pr.ref:AC   M18692; J03516;

we can see that J03516 occurs more than once. 
Followups redirected to bionet.software.gcg. 

> The SRS software can do the job properly...

I work with SRS, and can recommend the software for these purposes. 
Thure Etzold made the code available without cost and I appreciate this 
very much. The indices for the full set of sequence databases (PIR,
SWISSPROT, EMBL, GENBANK exclusion) occupy only about 130 MByte disk
space which is relatively little as compared to 500 MByte of data. 

Regards
Reinhard 

-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz@urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------

From owner-embldatabank@net.bio.net Tue Jul 06 23:00:00 1993
Path: biosci!daresbury!buzz.bmc.uu.se!embl-heidelberg.de!fuchs
From: fuchs@embl-heidelberg.de
Newsgroups: bionet.molbio.embldatabank
Subject: EMBL File Server Newsletter 10
Message-ID: <1993Jul7.150539.100470@embl-heidelberg.de>
Date: 7 Jul 93 14:05:39 GMT
Reply-To: NetHelp@EMBL-Heidelberg.DE
Organization: EMBL, European Molecular Biology Laboratory
Lines: 480

------------------------------------------------------------------------------
|  EMBL FILE SERVER News                            Number 10, July 7th 1993 |
|                                                                            |
|  European Molecular Biology Laboratory, Data Library & Computer Group,     |
|  Postfach 10.2209, 69012 Heidelberg, Germany.                              |
|                                             Tel: +49 6221 387258           |
|  E-mail: NetHelp@EMBL-Heidelberg.DE         Fax: +49 6221 387519           |
------------------------------------------------------------------------------


Contents:

 <1> Introduction
 <2> Blitz e-mail server
 <3> Updates to data collections
 <4> Updates to software collection
 <5> Other updates
 <6> Summary of directories on the file server
 <7> Getting started ?
 <8> Network addresses at EMBL


<1> Introduction
    ------------

    The EMBL File Server is a facility available on the EMBL computing system
    for external users to request files by electronic mail, anonymous FTP or
    Gopher. The service is free.

<2> Blitz e-mail server
    -------------------
    
    BLITZ is an automatic electronic mail server for the MPsrch program of 
    Shane Sturrock and John Collins, Biocomputing Research Unit, University
    of Edinburgh, Scotland.  

    MPsrch allows you to perform sensitive and extremely fast comparisons of
    your protein sequences against the Swiss-Prot protein sequence database
    using the Smith and Waterman best local similarity algorithm. It runs
    on the MasPar family of massively parallel machines; the BLITZ server
    uses a 4096-processor MasPar MP-1 system.   A typical search time for a
    query sequence of 400 amino acids is approximately 40 seconds to search
    the entire Swiss-Prot database.
    Additional time is required to reconstruct the alignments; the time for
    this will depend on the number of alignments requested.  MPsrch is the
    fastest implementation of the SW algorithm currently available on any
    machine.  

    How to use BLITZ
    ----------------

    Send a properly formatted electronic mail message to 

                    BLITZ@EMBL-Heidelberg.DE

    containing required commands and parameters and the answer will be 
    automatically mailed to you. To obtain full documentation of the service
    and algorithms used, send a message containing just the word HELP.

    If you have any problems using the BLITZ service, or any questions, please
    send them to:
                    NETHELP@EMBL-Heidelberg.DE

    Example
    -------

    SEQ
      STKKKPLTQE QLEDARRLKA IYEKKKNELG LSQESVADKM GMGQSGVGAL
      FNGINALNAY NAALLAKILK VSVEEFSPSI AREIYEMYEA VSMQPSLRSE
      YEYPVFSHVQ AGMFSPELRT FTKGDAERWV STTKKASDSA FWLEVEGNSM
      TAPTGSKPSF PDGMLILVDP EQAVEPGDFC IARLGGDEFT FKKLIRDSGQ
      VFLQPLNPQY PMIPCNESCS VVGKVIASQW PEETFG
    END
 
    The possible parameters (eg gap penalty, weight matrix) are explained in
    the HELP documentation.


<3> Updates to Data Collections
    ------------------------------------

    New databases have been added to the file server since Newsletter issue 9:

    (a) RELIBRARY - Restriction Enzyme List, by E. Raschke
        Raschke, E. (1993) Comprehensive restriction enzyme lists to update
        any DNA sequence computer program. Genetic Analysis, Techniques and
        Applications, Vol. 10, in press.

    (b) REPBASE - Protoypic sequences for human repetitive DNA, by J. Jurka
        Jurka, J., Walichiewicz, J. and Milosavljevic, A. (1992) J. Mol. Evol.
        35:286

    (c) HLA - Alignments of human class I and II HLA sequences, by S. Marsh,
        J. Zemmour and P. Parham.
 
    (d) TRANSTERM - Translational termination signal database, by C. Brown
        Brown, C.M., Dalphin, M.E., Stockwell, P.A., Tate, W.P. (1993) Nucleic
        Acids Res. Supplement, submitted

    (e) LISTA - Compilation of nucleotide sequences encoding proteins from
        the yeast saccharomyces, by P. Linder
        Mosse, M.O., Linder, P., Lazowska, J. and Slonimski, P.P. (1993) Curr.
        Genet. 23, 66.


<4> Updates to Software Collection
    ------------------------------

    Here is a list of new (N) molecular biological programs or updates (U):
    The full path specifications for these files on the EMBL ftp server are
    shown in square brackets.

    DOS:
    ----

    BED.UAA             (N) Sequence editor with speaker feeback (R. Hwan-Seok)
                            [/pub/software/dos/bed.uaa and .uab]

    BOXSHADE.UUE        (N) Plots of multiple alignments with boxes and shades
                            (K. Hofmann)
                            [/pub/software/dos/boxshade.uue]

    CM.UAA              (U) Restriction map construction from multiple digestion
                            data v1.0a (K. Hofmann)
                            [/pub/software/dos/cm.uaa to cm.uai]

    CONSENSE.UUE        (N) Calculation of consensus sequence from alignment
                            (S. Rensing)
                            [/pub/software/dos/consense.uue]

    DIGEST.UAA          (N) Restriction enzyme mapping (R. Nakisa)
                            [/pub/software/dos/digest.uaa and digest.uab]

    DOTPLOT.UAA         (U) Dot plot analysis v3.0 (R. Nakisa)
                            [/pub/software/dos/dotplot.uaa and dotplot.uaa]

    EXPLORE.UAA         (N) Analysis of multiple alignments (G. Golding)
                            [/pub/software/dos/explore.uaa and explore.uab]

    HTH.C               (N) Prediction of helix-turn-helix regions (C. Halling)
                            [/pub/software/dos/hth.c]

    RASMOL.UAA          (N) Visualisation of macromolecules using PDB files
                            (R. Sayle)
                            [/pub/software/dos/rasmol.uaa to rasmol.uac]

    SILMUT.UUE          (N) Identification of regions suitable for silent
                            mutagenesis (B. Shankarappa)
                            [/pub/software/dos/silmut.uue]

    SORFIND.UUE         (U) Prediction of exons in vertebrate genomic DNA v1.7
                            (G. Hutchinson)
                            [/pub/software/dos/sorfind.uue]

    WINBLAST.UAA        (N) Windows 3.1 interface for NCBI BLAST server
                            (A. Sivaprasad)
                            [/pub/software/dos/winblast.uaa to winblast.uae]

    WINDOT.UAA          (N) Dot plot analyis under Windows 3.1 (R. Nakisa)
                            [/pub/software/dos/windot.uaa and windot.uab]

    WINIRX.UAA          (N) Windows 3.1 interface for NCBI IRX server
                            [/pub/software/dos/winirx.uaa to winirx.uae]

    WINSEQ.UUE          (N) Windows version of D. Gilbert's READSEQ (R. Nakisa)
                            [/pub/software/dos/winseq.uue]


    Mac:
    ----

    ALIGN.HQX           (U) Sequence alignments and phylogeny (replacement of
                            corrupt archive) (D. Feng)
                            [/pub/software/mac/align.hqx]

    AMPLIFY.HQX         (U) PCR primer checks v1.2 (B. Engels)
                            [/pub/software/mac/amplify.hqx]

    BKGCUCLC_FONTS.HQX  (N) Color fonts for use with sequence editors (M. Sogin)
                            [/pub/software/mac/bkgcuclc_fonts.hqx]

    DBCONV.HQX          (U) Transformation of line-oriented databases into tab-
                            delimited format v2.0 (J. Valverde)
                            [/pub/software/mac/dbconb.hqx]

    DIGISPEAK.HQX       (N) Sequence reading with sonic digitizer (N. Mantei)
                            [/pub/software/mac/digispeak.hqx]

    DNAID.HQX           (U) DNA sequence editor with grep-like search routine
                            v1.8 (F. Dardel)
                            [/pub/software/mac/dnaid.hqx]

    EMBL-SEARCH.HQX     (U) Database retrieval software for EMBL CD-ROM v2.3.2
                            (EMBL Data Library)
                            [/pub/software/mac/embl-search.hqx]

    EMBL-SEARCH_SRC.HQX (U) Source code for EMBL-Search v2.3.2
                            [/pub/software/mac/embl-search_src.hqx]

    HTH.C               (N) Prediction of helix-turn-helix regions (C. Halling)
                            [/pub/software/mac/hth.c]

    LABHELPER.HQX       (N) General laboratory tools for buffer preps etc.
                            (T. Tzeng)
                            [/pub/software/mac/labhelper.hqx]

    MACCLADE30_DEMO.HQX (N) Demo of phylogenetic analysis program (W. Maddison)
                            [/pub/software/mac/macclade30_demo.hqx]

    MACP12.HQX          (N) Protein property profile plots (A. Luettke)
                            [/pub/software/mac/macp12.hqx]

    MACPATTERN.HQX      (U) Protein pattern searching with PROSITE and
                            BLOCKS database v3.0 (R. Fuchs)
                            [/pub/software/mac/macpattern.hqx]

    MACSTAN.HQX         (N) Random nucleotide sequence generator and analyzer
                            (F. Gast)
                            [/pub/software/mac/macstan.hqx]

    PROTEINSTRUCTURE.HQX (N) Tutorial on protein structures (C. Burchill)
                            [/pub/software/mac/proteinstructure.hqx]

    READSEQ.HQX         (U) Sequence format conversion program (D. Gilbert)
                            [/pub/software/mac/readseq.hqx]

    RRNA-STACK.HQX      (U) Phylogeny of 16/18S rRNA (J. Brown)
                            [/pub/software/mac/rrna-stack.hqx]

    STUFFITLITE.HQX     (U) Compression/decompression/binhex program v3.0.5
                            (R. Lau)                            
                            [/pub/software/mac/stuffitlite.hqx or
                             stuffitlite.sea]

    TOPPPRED.HQX        (N) Prediction of transmembrane segments and their
                            topology (G. v. Heijne, M.G. Claros)

    YEASTSTRAINS.HQX    (U) Strain management, in particular yeast v1.2
                            (K. Froehlich)
                            [/pub/software/mac/yeaststrains.hqx]

    UNIX:
    -----

    BLKSRCH.UUE         (N) Block search analysis of protein sequences with
                            the BLOCKS database (R. Fuchs)
                            [/pub/software/unix/blocksearch.tar.Z]

    BTAB.UAA            (N) BLAST output parser (M. Dubnick)
                            [/pub/software/unix/btab.tar.Z]

    double-digester.tar.Z (N) Graphical analysis of double digest restriction
    (only on FTP server)    data (L. Wright).

    DTASK11S.UAA        (N) Smith-Waterman database searches on parallel work-
                            stations (G. Hauge)
                            [/pub/software/unix/dtask11s.tar.Z]

    FILTER.UAA          (U) Suboptimal alignments and motif recognition
                            (P. Argos)
                            [/pub/software/unix/filter.tar.Z]

    GENAL.UUE           (N) Alignments of genomic sequences with coding regions
                            (J. Stoevlbaek)
                            [/pub/software/unix/genal.tar.Z]

    GM.UAA              (U) Analysis of unknown DNA sequences v2.0 (C. Fields)
                            [/pub/software/unix/gm.tar.Z]

    HTH.C               (N) Prediction of helix-turn-helix regions (C. Halling)
                            [/pub/software/unix/hth.c]

    MAILFASTA.UUE       (U) Script for using EMBL/GenBank Mail-FASTA servers
                            v3.1 (T. deBoer)
                            [/pub/software/unix/mailfasta.tar.Z]

    MOLBIO.UAA          (N) C++ clas library for molecular biology (K. Robison)
                            [/pub/software/unix/molbio.tar.Z]

    RASMOL.UAA          (N) Visualisation of macromolecules using PDB files
                            (R. Sayle)
                            [/pub/software/unix/rasmol.tar.Z]

    READSEQ.UAA         (U) Sequence format conversion program (D. Gilbert)
                            [/pub/software/unix/readseq.tar.Z]

    SUPER.UAA           (N) Tool for electronic user polls (R. Doelz)
                            [/pub/software/unix/super.tar.Z]

    VAX:
    ----

    BLKSRCH.UUE         (N) Block search analysis of protein sequences with
                            the BLOCKS database (R. Fuchs)
                            [/pub/software/vax/blksrch.uue]

    FILTER.UAA          (U) Suboptimal alignments and motif recognition
                            (P. Argos)
                            [/pub/software/vax/filter.uaa to filter.uad]

    GCGMENU.UAA         (N) Menu interface to GCG package (C. Gartmann)
                            [/pub/software/vax/gcgmenu.uaa and gcgmenu.uab]

    HTH.C               (N) Prediction of helix-turn-helix regions (C. Halling)
                            [/pub/software/vax/hth.c]

    PDBTOGCG.FOR        (N) Extracts protein sequences from PDB files (M. Mezei)
                            [/pub/software/vax/pdbtogcg.for]

    READSEQ.UAA         (U) Sequence format conversion program (D. Gilbert)
                            [/pub/software/vax/readseq.uaa to readseq.uac]

    SUPER.UUE           (N) Tool for electronic user polls (R. Doelz)
                            [/pub/software/unix/super.uue]

    TFD2GCG.FOR         (N) Converts TFD SITES table to GCG Prosite format
                            (D. Mathog)
                            [/pub/software/vax/tfd2gcg.for]

    UNZIP.UUE           (N) Decompresses ZIP encoded files (H. Smith)
                            [/pub/software/vax/unzip.uue]

<5> Other updates
    -------------

    (a) A new directory that will hold information for crystallographers,
        XRAY. The only file currently present is the list of e-mail addresses
        of crystallographers and related scientists maintained by M. Teeter,
        Boston College.
        E-mail server: directory XRAY
        Anonymous ftp: /pub/databases/xray

    (b) ALIGN directory:

        o DS13648.DAT            - Alignment of plasmodial small subunit
                                   rRNAs.
                                   Submitted by V.Enea, 26-Feb-1993

        o DS13893.DAT            - Alignment of amino acid sequences derived
                                   from conceptual reading frames of Tc1-like
                                   elements from fish, nematodes, fruit flies
                                   an agnathan, and a spider.
                                   Submitted by D.H.A. Fitch, 18-Mar-1993

        o DS13894.DAT            - Alignment of Tc1-like elements from salmon,
                                   trout, zebrafish, and catfish.
                                   Submitted by D.H.A. Fitch, 18-Mar-1993

        o DS14642.DAT            - Alignment of amino acid sequences of the
                                   zinc-containing long-chain alcohol
                                   dehydrogenases.
                                   Submitted by S. Yokoyama, 17-Jun-1993



<6> Summary of directories on the file server
    ---------------------------------------

    directories with updated information are marked by an asterisk.

                                           Anonymous ftp          NetServ
                                          --------------         ---------
*   EMBL Nucleotide Sequence Database    /pub/databases/embl       NUC
      (Rel. 35, Jun 93 + updates)
*   Eukaryotic Promotor Database         /pub/databases/epd        EPD
      (Rel. 35, May 93)
*   SwissProt Protein Sequence Database  /pub/databases/swissprot  PROT
       (Rel. 25, Apr 92 + updates)
*   Prosite pattern database             /pub/databases/prosite    PROSITE
       (Rel. 10.1, Apr 93)
*   ENZYME database                      /pub/databases/enzyme     ENZYME
       (Rel. 12.00, Aug 93)
*   Brookhaven Protein Databank          not available             PROTEINDATA
       (Rel. 61, Jul 92 + pre-release)
*   REBASE, Restriction Enzyme Database  /pub/databases/rebase     REBASE
       (Rel. 9307, Jul 93)
*   RELIBRARY, Restriction Enzyme List   /pub/databases/relibrary  RELIBRARY
       (Apr 1993)
*   tRNA sequence and gene sequence db   /pub/databases/trna       TRNA
       (1993)
*   REPBASE - Prototypic sequences for   /pub/databases/repbase    REPBASE
      human repetitive DNA (Rel. 1.01 1992)
*   TFD, Transcription Factor Database   /pub/databases/tfd        TFD
       (Ver 5.5, Nov 92)
*   ECD, E.coli Database                 /pub/databases/ecd        ECD
       (Rel. 14, Feb 93)
    FLYBASE, Drosophila Genetic Map db   /pub/databases/flybase    FLYBASE
       (9209, 8-Sep-1992)
    LiMB, Listing of Mol. Biol. db's     /pub/databases/limb       LIMB
       (Rel. 3.0)
*   SEQANALREF, Seq. analysis refs       /pub/databases/reflist    REFLIST
       (Rel. 38, Apr 93)
    FANS_REF, Functional analysis refs   /pub/databases/reflist    REFLIST
      (Rel. 3.4, Apr 91)
    Alu sequence database and alignment  /pub/databases/alu        ALU
    Haemophilia B database               /pub/databases/haemb      HAEMB
      (Rel. 2, Oct 1992)
    Compilation of small RNA sequences   /pub/databases/smallrna   SMALLRNA
      (Oct 91)
    Berlin Databank of 5S rRNA and       /pub/databases/berlin     BERLIN
      5S rRNA gene sequences (1991)
*   Compilation of small ribosomal       /pub/databases/rrna       RRNA
      subunit RNA sequences (Jun 1993)
    CUTG, codon usage                    /pub/databases/cutg       CUTG
      tabulated from GenBank rel. 69
*   3D_Ali, 3D alignment database        /pub/databases/3d_ali     3D_ALI
      (Jun 1993)
*   RLDB, Reference Library Database     /pub/databases/rldb
      (Jun 1993)                         (ftp only)
*   PKCDD, Protein Kinase Catalytic      /pub/databases/pkcdd      PKCDD
      Domain Database (April 1993)
*   CpG Islands Database                 /pub/databases/cpgisle    CPGISLE
      (Release 2.0, Apr 1992)
*   Blocks database                      /pub/databases/blocks     BLOCKS
      (Rel. 6.0, Jan 1993)
*   HLA, Alignments of human HLA        /pub/databases/hla         HLA
      sequences (Jul 1993)
*   TRANSTERM, Translational            /pub/databases/transterm   TRANSTERM
      Termination Signal Database (Apr 1993)
*   LISTA, Yeast coding sequences       /pub/databases/lista       LISTA
      (Rel. 2.0, Apr 1993)
*   HSSP, sequence-aligned protein       /pub/databases/protein_extras/hssp
      families                                                     PROTEINDATA
*   FSSP, structure-aligned protein      /pub/databases/protein_extras/fssp
      families                           (ftp only)
*   DSSP, protein secondary structures   /pub/databases/protein_extras/dssp
                                                                   PROTEINDATA
    pdb_select, representative sets of   /pub/databases/protein_extras/
                3D proteins (Sep 1992)   (ftp only)                pdb_select
    Misfolded, database of deliberately  /pub/database/protein_extras/misfolded
       misfolded protein models (Nov 92) (ftp only)

    Software:

    Software for MS-DOS computers        /pub/software/dos         DOS_SOFTWARE
    Software for Apple Macintosh         /pub/software/mac         MAC_SOFTWARE
    Software for UNIX                    /pub/software/unix        UNIX_SOFTWARE
    Software for VAX/VMS                 /pub/software/vax         VAX_SOFTWARE
    Other software                       /pub/software/misc        MISC_SOFTWARE

    Miscellaneous:

*   Technical documents, submission and  /pub/doc                  DOC
      order forms, etc.
*   Multiple DNA sequence alignments     /pub/databases/embl/align ALIGN
      and consensus sequences
    Codon Usage tables                   /pub/databases/codonusage CODONUSAGE
*   Crystallographers' information       /pub/databases/xray       XRAY


<7> Getting Started ?
    -----------------
    
    For initial information, send standard electronic mail to the address:
      NetServ@EMBL-Heidelberg.DE
    containing just the word HELP on a line by itself.

    To use the anonymous ftp server, connect to the internet address
      FTP.EMBL-Heidelberg.DE
    using the username "anonymous" (without the quotes !) and giving your
    e-mail address as the password. Look in the directory /pub/help for
    various help files.

    To use the Gopher server, open a connection to FTP.EMBL-Heidelberg.DE
    at the standard Gopher port 70.


<8> Network addresses at EMBL
    -------------------------

    EMBL File Server (e-mail requests)        NetServ@EMBL-Heidelberg.DE
    Anonymous FTP                             FTP.EMBL-Heidelberg.DE
    BLITZ e-mail server                       Blitz@EMBL-Heidelberg.DE
    FASTA e-mail server                       FASTA@EMBL-Heidelberg.DE
    Quicksearch e-mail server                 Quick@EMBL-Heidelberg.DE

    Problems, feedback (human contact)        NetHelp@EMBL-Heidelberg.DE
    EMBL Data Library enquiries               DataLib@EMBL-Heidelberg.DE
    EMBL Data Library sequence submissions    DataSubs@EMBL-Heidelberg.DE
    Software submissions and problems         Software@EMBL-Heidelberg.DE

From owner-embldatabank@net.bio.net Tue Jul 06 23:00:00 1993
Path: biosci!agate!spool.mu.edu!uunet!newsflash.concordia.ca!mizar.cc.umanitoba.ca!frist
From: frist@ccu.umanitoba.ca
Newsgroups: bionet.molbio.genbank,bionet.molbio.embldatabank
Subject: "source" feature key
Keywords: GenBank EMBL DDBJ feature table
Message-ID: <C9t2zA.Gxv@ccu.umanitoba.ca>
Date: 7 Jul 93 17:52:22 GMT
Sender: news@ccu.umanitoba.ca
Followup-To: bionet.molbio.genbank
Organization: University of Manitoba, Winnipeg, Manitoba, Canada
Lines: 84
Xref: biosci bionet.molbio.genbank:1335 bionet.molbio.embldatabank:206
Nntp-Posting-Host: norton.cc.umanitoba.ca

The current Feature Table Definition (Version 1.04) describes a new feature
key:

- - - - - - - - - - - - - - (from FT Definition) - - - - - - - - - - - - 
    Feature key        source

    Definition: identifies the biological source of the specified span of the
    sequence. This key is mandatory. Every entry will have, as a minimum,
    a single source key spanning the entire sequence. More than one source
    key per sequence is permittable.

    Mandatory qualifiers:  /organism="text"

    Optional qualifiers:   a whole bunch, including /label=feature_label
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This is a very welcome addition to the Feature Table (FT) language. However,
I have some questions.

1) Source is does not appear to be implemented as of GenBank Release 77.0.
Is it likely to be implemented soon? Has it only been implemented in new
entries, so that I just haven't seen it yet?

2) Can we have a few working examples of source in use? There are a number
of usage conventions that need to be agreed upon. Let me propose that 
it might be very convenient if the 'source' label field was just the 
ORIGINAL accession number for that span:

ORIGINAL entry:
     source                1..1089
                           /label=X12345
                           /organism="Unicornus mythicalis"
     exon                  103..712
                           /label=X12345:exon1
                           /note="This feature added for illustrative
                           purposes" 

SAME FEATURES AFTER MERGER INTO LARGER ENTRY:
     source                134231..135319
                           /label=X12345
                           /organism="Unicornus mythicalis"
     exon                  134333..134942
                           /label=X12345:exon1
                           /note="This feature added for illustrative
                           purposes" 

Note that I am proposing the convention that all features other than
source have labels that are FT expressions, incorporating the original
accession number. This convention ensures that each label is UNIQUE
ACROSS THE ENTIRE DATABASE, and NEVER CHANGES. To illustrate a point,
one might merge a bunch of sequences representing a chromosomal region
into a larger entry. Prior to merger, many of the component entries might
have features labled "exon1", and so their labels would be ambigous. 
Incorporating the accession # into the label prevents this probelem.
Another ramification of the conventions I have shown is that even base
ranges remain valid. For example:

X12345:103..712 

would still return the same exon from the original entry
prior to merger, because the software can reconstruct X12345 from the
new entry.

It is my feeling that we need to build mechanisms like this into
the databases that will protect FT expressions from obsolescence.
Using conventions such as this, one can maintain lists of expressions
that will always retrieve the same sequence, regardless of how much
merging and correction goes on among entries. 

3) The wording of the definition implies that every entry must now have
a Feature Table, even if only to have a source field. Is that true?

I guess I've done more than simply ask questions here, but I think
these points were worth bringing up. 

===============================================================================
Brian Fristensky                | 
Department of Plant Science     |  A question is like a knife that slices
University of Manitoba          |  through the stage backdrop and gives us
Winnipeg, MB R3T 2N2  CANADA    |  a look at what lies hidden behind.
frist@ccu.umanitoba.ca          |  
Office phone:   204-474-6085    |  Milan Kundera, THE UNBEARABLE LIGHTNESS 
FAX:            204-261-5732    |  OF BEING
===============================================================================

From owner-embldatabank@net.bio.net Tue Jul 06 23:00:00 1993
Path: biosci!EMBL-Heidelberg.DE
From: Rainer.Fuchs@EMBL-Heidelberg.DE (Rainer Fuchs (EMBL Data Library))
Newsgroups: bionet.announce,bionet.molbio.embldatabank
Subject: EMBL Newsletter 10
Message-ID: <1993Jul7.145406.100469@embl-heidelberg.de>
Date: 7 Jul 93 13:55:04 GMT
Sender: kristoff@net.bio.net
Reply-To: NetHelp@EMBL-Heidelberg.DE
Followup-To: bionet.molbio.embldatabank
Organization: EMBL, European Molecular Biology Laboratory
Lines: 481
Approved: bionews-moderator@net.bio.net
Xref: biosci bionet.announce:610 bionet.molbio.embldatabank:205


------------------------------------------------------------------------------
|  EMBL FILE SERVER News                            Number 10, July 7th 1993 |
|                                                                            |
|  European Molecular Biology Laboratory, Data Library & Computer Group,     |
|  Postfach 10.2209, 69012 Heidelberg, Germany.                              |
|                                             Tel: +49 6221 387258           |
|  E-mail: NetHelp@EMBL-Heidelberg.DE         Fax: +49 6221 387519           |
------------------------------------------------------------------------------


Contents:

 <1> Introduction
 <2> Blitz e-mail server
 <3> Updates to data collections
 <4> Updates to software collection
 <5> Other updates
 <6> Summary of directories on the file server
 <7> Getting started ?
 <8> Network addresses at EMBL


<1> Introduction
    ------------

    The EMBL File Server is a facility available on the EMBL computing system
    for external users to request files by electronic mail, anonymous FTP or
    Gopher. The service is free.

<2> Blitz e-mail server
    -------------------
    
    BLITZ is an automatic electronic mail server for the MPsrch program of 
    Shane Sturrock and John Collins, Biocomputing Research Unit, University
    of Edinburgh, Scotland.  

    MPsrch allows you to perform sensitive and extremely fast comparisons of
    your protein sequences against the Swiss-Prot protein sequence database
    using the Smith and Waterman best local similarity algorithm. It runs
    on the MasPar family of massively parallel machines; the BLITZ server
    uses a 4096-processor MasPar MP-1 system.   A typical search time for a
    query sequence of 400 amino acids is approximately 40 seconds to search
    the entire Swiss-Prot database.
    Additional time is required to reconstruct the alignments; the time for
    this will depend on the number of alignments requested.  MPsrch is the
    fastest implementation of the SW algorithm currently available on any
    machine.  

    How to use BLITZ
    ----------------

    Send a properly formatted electronic mail message to 

                    BLITZ@EMBL-Heidelberg.DE

    containing required commands and parameters and the answer will be 
    automatically mailed to you. To obtain full documentation of the service
    and algorithms used, send a message containing just the word HELP.

    If you have any problems using the BLITZ service, or any questions, please
    send them to:
                    NETHELP@EMBL-Heidelberg.DE

    Example
    -------

    SEQ
      STKKKPLTQE QLEDARRLKA IYEKKKNELG LSQESVADKM GMGQSGVGAL
      FNGINALNAY NAALLAKILK VSVEEFSPSI AREIYEMYEA VSMQPSLRSE
      YEYPVFSHVQ AGMFSPELRT FTKGDAERWV STTKKASDSA FWLEVEGNSM
      TAPTGSKPSF PDGMLILVDP EQAVEPGDFC IARLGGDEFT FKKLIRDSGQ
      VFLQPLNPQY PMIPCNESCS VVGKVIASQW PEETFG
    END
 
    The possible parameters (eg gap penalty, weight matrix) are explained in
    the HELP documentation.


<3> Updates to Data Collections
    ------------------------------------

    New databases have been added to the file server since Newsletter issue 9:

    (a) RELIBRARY - Restriction Enzyme List, by E. Raschke
        Raschke, E. (1993) Comprehensive restriction enzyme lists to update
        any DNA sequence computer program. Genetic Analysis, Techniques and
        Applications, Vol. 10, in press.

    (b) REPBASE - Protoypic sequences for human repetitive DNA, by J. Jurka
        Jurka, J., Walichiewicz, J. and Milosavljevic, A. (1992) J. Mol. Evol.
        35:286

    (c) HLA - Alignments of human class I and II HLA sequences, by S. Marsh,
        J. Zemmour and P. Parham.
 
    (d) TRANSTERM - Translational termination signal database, by C. Brown
        Brown, C.M., Dalphin, M.E., Stockwell, P.A., Tate, W.P. (1993) Nucleic
        Acids Res. Supplement, submitted

    (e) LISTA - Compilation of nucleotide sequences encoding proteins from
        the yeast saccharomyces, by P. Linder
        Mosse, M.O., Linder, P., Lazowska, J. and Slonimski, P.P. (1993) Curr.
        Genet. 23, 66.


<4> Updates to Software Collection
    ------------------------------

    Here is a list of new (N) molecular biological programs or updates (U):
    The full path specifications for these files on the EMBL ftp server are
    shown in square brackets.

    DOS:
    ----

    BED.UAA             (N) Sequence editor with speaker feeback (R. Hwan-Seok)
                            [/pub/software/dos/bed.uaa and .uab]

    BOXSHADE.UUE        (N) Plots of multiple alignments with boxes and shades
                            (K. Hofmann)
                            [/pub/software/dos/boxshade.uue]

    CM.UAA              (U) Restriction map construction from multiple digestion
                            data v1.0a (K. Hofmann)
                            [/pub/software/dos/cm.uaa to cm.uai]

    CONSENSE.UUE        (N) Calculation of consensus sequence from alignment
                            (S. Rensing)
                            [/pub/software/dos/consense.uue]

    DIGEST.UAA          (N) Restriction enzyme mapping (R. Nakisa)
                            [/pub/software/dos/digest.uaa and digest.uab]

    DOTPLOT.UAA         (U) Dot plot analysis v3.0 (R. Nakisa)
                            [/pub/software/dos/dotplot.uaa and dotplot.uaa]

    EXPLORE.UAA         (N) Analysis of multiple alignments (G. Golding)
                            [/pub/software/dos/explore.uaa and explore.uab]

    HTH.C               (N) Prediction of helix-turn-helix regions (C. Halling)
                            [/pub/software/dos/hth.c]

    RASMOL.UAA          (N) Visualisation of macromolecules using PDB files
                            (R. Sayle)
                            [/pub/software/dos/rasmol.uaa to rasmol.uac]

    SILMUT.UUE          (N) Identification of regions suitable for silent
                            mutagenesis (B. Shankarappa)
                            [/pub/software/dos/silmut.uue]

    SORFIND.UUE         (U) Prediction of exons in vertebrate genomic DNA v1.7
                            (G. Hutchinson)
                            [/pub/software/dos/sorfind.uue]

    WINBLAST.UAA        (N) Windows 3.1 interface for NCBI BLAST server
                            (A. Sivaprasad)
                            [/pub/software/dos/winblast.uaa to winblast.uae]

    WINDOT.UAA          (N) Dot plot analyis under Windows 3.1 (R. Nakisa)
                            [/pub/software/dos/windot.uaa and windot.uab]

    WINIRX.UAA          (N) Windows 3.1 interface for NCBI IRX server
                            [/pub/software/dos/winirx.uaa to winirx.uae]

    WINSEQ.UUE          (N) Windows version of D. Gilbert's READSEQ (R. Nakisa)
                            [/pub/software/dos/winseq.uue]


    Mac:
    ----

    ALIGN.HQX           (U) Sequence alignments and phylogeny (replacement of
                            corrupt archive) (D. Feng)
                            [/pub/software/mac/align.hqx]

    AMPLIFY.HQX         (U) PCR primer checks v1.2 (B. Engels)
                            [/pub/software/mac/amplify.hqx]

    BKGCUCLC_FONTS.HQX  (N) Color fonts for use with sequence editors (M. Sogin)
                            [/pub/software/mac/bkgcuclc_fonts.hqx]

    DBCONV.HQX          (U) Transformation of line-oriented databases into tab-
                            delimited format v2.0 (J. Valverde)
                            [/pub/software/mac/dbconb.hqx]

    DIGISPEAK.HQX       (N) Sequence reading with sonic digitizer (N. Mantei)
                            [/pub/software/mac/digispeak.hqx]

    DNAID.HQX           (U) DNA sequence editor with grep-like search routine
                            v1.8 (F. Dardel)
                            [/pub/software/mac/dnaid.hqx]

    EMBL-SEARCH.HQX     (U) Database retrieval software for EMBL CD-ROM v2.3.2
                            (EMBL Data Library)
                            [/pub/software/mac/embl-search.hqx]

    EMBL-SEARCH_SRC.HQX (U) Source code for EMBL-Search v2.3.2
                            [/pub/software/mac/embl-search_src.hqx]

    HTH.C               (N) Prediction of helix-turn-helix regions (C. Halling)
                            [/pub/software/mac/hth.c]

    LABHELPER.HQX       (N) General laboratory tools for buffer preps etc.
                            (T. Tzeng)
                            [/pub/software/mac/labhelper.hqx]

    MACCLADE30_DEMO.HQX (N) Demo of phylogenetic analysis program (W. Maddison)
                            [/pub/software/mac/macclade30_dem
From owner-embldatabank@net.bio.net Tue Jul 06 23:00:00 1993
Path: biosci!uwm.edu!cs.utexas.edu!uunet!pipex!uknet!pavo.csi.cam.ac.uk!mrc-lmb.cam.ac.uk!rs
From: rs@mrc-lmb.cam.ac.uk (Staden R.)
Newsgroups: bionet.molbio.embldatabank,bionet.software
Subject: Re: Use of sequence library indexes
Message-ID: <1993Jul7.135349.4203@infodev.cam.ac.uk>
Date: 7 Jul 93 13:53:49 GMT
References: <1993Jul1.234108.1@molbiol.ox.ac.uk> <1993Jul2.055206.7673@comp.bioz.unibas.ch>
Sender: news@infodev.cam.ac.uk (USENET news)
Organization: MRC Laboratory of Molecular Biology, Cambridge UK
Lines: 39
Xref: biosci bionet.molbio.embldatabank:204 bionet.software:5395
Nntp-Posting-Host: al.mrc-lmb.cam.ac.uk

There have been several articles about the use of indexes for extracting
entries from the sequence libraries and the problems that gcg has when
an accession number occurs in more than one entry.

Mention was made of a package (SRS) that does not have this problem.

Readers may be interested to know of our own method of dealing with
sequence library indexes (and which also does not suffer from the
accession number problem).

We decided to use the indexing system that is included on the EMBL
cdrom, and using it we can extract entries based on accession number
and entry name. In addition we can perform instantaneous author and
text searches (the text indexes include every non-trivial word throughout
an entry - not just the keywords - so we find it very useful).           

So this allows us to use EMBL and SWISSPROT from the cdrom (or copied
to disk for extra speed), but we also wanted to be able to use EMBL updates
and PIR and GenBank. To make this possible we wrote software to
create EMBL cdrom style indexes for all libraries ie we create
entryname, accession number, author and freetext indexes for all
library formats. It is important to realise that we do not change
the libraries, but leave them as distributed. Not having to reformat
or change the libraries obviously saves a great deal of time and
temporary disk space.

An article dexcribing the initial work on this subject was buried
in Staden,R and Dear,S "Indexing the sequence libraries: software
providing a common indexing system for all the standard sequence 
libraries. DNA Sequence 3, 99-105 (1992).

Rodger Staden



-- 
Rodger Staden, Medical Research Council Laboratory Of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, ENGLAND     Telephone: 0223 402389
Internet: rs@UK.AC.Cam.MRC-LMB              Facsimile: 0223 412282

From owner-embldatabank@net.bio.net Wed Jul 07 23:00:00 1993
Path: biosci!U.WASHINGTON.EDU!nok
From: nok@U.WASHINGTON.EDU (Supachitra Chadchawan)
Newsgroups: bionet.molbio.embldatabank
Subject: I would like to know how to submit the sequence to embl-databank.  Can anyone tell me, please?  Thank you in advance.
Message-ID: <Pine.3.05z.9307081526.A23509-0100000@carson.u.washington.edu>
Date: 8 Jul 93 22:37:26 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 1



From owner-embldatabank@net.bio.net Sun Jul 18 23:00:00 1993
Path: biosci!uwm.edu!cs.utexas.edu!usc!howland.reston.ans.net!noc.near.net!uunet!mcsun!sun4nl!news.sara.nl!HASARA11.SARA.NL!A428ENDE
From: A428ENDE@HASARA11.SARA.NL
Newsgroups: bionet.molbio.embldatabank
Subject: <none>
Message-ID: <16C10F1D6.A428ENDE@HASARA11.SARA.NL>
Date: 19 Jul 93 21:13:58 GMT
Organization: S.A.R.A. Academic Computing Services Amsterdam
Lines: 8
Nntp-Posting-Host: vm1.sara.nl
X-Newsreader: NNR/VM S_1.3.2

Dear Sir,
Because I am not familiar with the GCG programs I wrote my own program to parse
 the feature table of sequences and so tabulate the codon usage in the coding s
trand. This works well but I found out that there is a strange discrepancy betw
een sequences. About half of them contained in the CDS parameters also the stop
 codon, but the other half stopped just before the stop codon. Why is this?
Henk van de Kamer
Email: CHLAMY@SARA.NL

From owner-embldatabank@net.bio.net Sun Jul 18 23:00:00 1993
Path: biosci!UUNET.UU.NET!aditya!raman
From: aditya!raman@UUNET.UU.NET (K. Ramnarayan)
Newsgroups: bionet.molbio.embldatabank
Subject: (none)
Message-ID: <9307200101.AA03516@aditya>
Date: 20 Jul 93 01:01:44 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 5

Hello,
I was wondering if anyone knows if any IGG class has been identified
for Dolphins?  Thanks very much for your help.
..Ramnarayan
raman%aditya@uunet.uu.net

From owner-embldatabank@net.bio.net Thu Jul 22 23:00:00 1993
Path: biosci!bcm!cs.utexas.edu!usc!math.ohio-state.edu!uwm.edu!linac!uchinews!kimbark!chh9
From: chh9@kimbark.uchicago.edu (Conrad Halling)
Newsgroups: bionet.molbio.embldatabank
Subject: Re: <none>
Message-ID: <1993Jul23.170509.18779@midway.uchicago.edu>
Date: 23 Jul 93 17:05:09 GMT
References: <16C10F1D6.A428ENDE@HASARA11.SARA.NL>
Sender: news@uchinews.uchicago.edu (News System)
Reply-To: chh9@midway.uchicago.edu
Organization: University of Chicago
Lines: 31

In article <16C10F1D6.A428ENDE@HASARA11.SARA.NL> 
     A428ENDE@HASARA11.SARA.NL writes:

>Because I am not familiar with the GCG programs I wrote my own program to 
>parse the feature table of sequences and so tabulate the codon usage in the 
>coding strand. This works well but I found out that there is a strange 
>discrepancy between sequences. About half of them contained in the CDS 
>parameters also the stop codon, but the other half stopped just before the 
>stop codon. Why is this?

I asked this question myself a few months ago because I, too, have written a 
program that counts codons.

The answer is that EMBL's old definition of the CDS (coding sequence) did NOT
include the stop codon.  The new standard definition does.  Unfortunately,
EMBL is behind on fixing the old sequences.  Many of these sequences have
been transferred to GenBank, so there are some GenBank entries for which
the CDS feature does not include the stop codon.

The only solution for now is to count the codons, then see if you have
a stop codon.  If you don't, check the next codon after the end to see
if it's a stop codon.

As a warning, you will have to check to be sure that there is only one
stop codon in a CDS range.  You should also check that when you have one
stop codon that it is at the end of the CDS range.  There are several
entries in which the CDS range is incorrect.

-- 
Conrad Halling
c-halling@uchicago.edu

From owner-embldatabank@net.bio.net Thu Jul 29 23:00:00 1993
Path: biosci!UICVM.UIC.EDU!CAMPALX%BRUFMG
From: CAMPALX%BRUFMG@UICVM.UIC.EDU (alex)
Newsgroups: bionet.molbio.embldatabank
Subject: get sequences
Message-ID: <9307302316.AA15088@net.bio.net>
Date: 30 Jul 93 23:16:20 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 1

get nuc:M86748

From owner-embldatabank@net.bio.net Thu Jul 29 23:00:00 1993
Path: biosci!UICVM.UIC.EDU!CAMPALX%BRUFMG
From: CAMPALX%BRUFMG@UICVM.UIC.EDU (alex)
Newsgroups: bionet.molbio.embldatabank
Subject: Informations
Message-ID: <9307302318.AA15219@net.bio.net>
Date: 30 Jul 93 23:18:48 GMT
Sender: daemon@net.bio.net
Distribution: bionet
Lines: 17

Dear Sir,
I tried to get some informations from this data library
at the addres described in the NAR 20 (suppl), 1991.
I don't receveive any help and answer,as has been
sended.
I'm very interested in getting all sequences of Campylobacter
and Helicobacter microorganisms. I would like very much if you
could send by BITNET the records of this bacteria or a way  to
get this sequences.
Best Regards,
Alexandro C. T. Carvalho
Department of Microbiology-Instituto de Ciencias
Biologicas da UFMG, Belo Horizonte-MG. BRAZIL
C.P 2486- 31270-901.
FAX 4411412
BITNET CAMPALX@BRUFMG
INTERNET CAMPALX@VM1.LCC.UFMG.BR

