GePSAN newsletter issue 1

GEPSAN at CMU.UNIGE.CH GEPSAN at CMU.UNIGE.CH
Fri Jan 11 02:49:00 EST 1991


=============================================================================


        ***********************************************
        * GePSAN                                      *
        * Geneva Protein Sequence Analysis Newsletter *
        ***********************************************

Published by: Amos Bairoch
              Dept. Medical Biochemistry / University of Geneva.
              Switzerland

Volume 1, Number 1 /  January 1991



 To subscribe (or unsubscribe) to this newsletter: gepsan at cgecmu51.bitnet
 To send comments/suggestions/criticisms:          bairoch at cgecmu51.bitnet



 Data bases availability summary

+------------+-------+------------------------------------------------------+
| Data base  | Rel.  |  Email              FTP        FTP     Tape   CD-ROM |
|            |       |  EMBL File Server   GenBank    NCBI                  |
+------------+-------+------------------------------------------------------+
| SWISS-PROT | 16.0  |  Yes (by entry)     Soon       Yes     Yes    Yes    |
| ENZYME     |  3.0  |  Yes                Yes        Yes     Yes    Yes    |
| PROSITE    |  6.0  |  Yes                Yes        Yes     Yes    Yes    |
| SEQANALREF | 13.5  |  Yes                Yes        Yes     No     No     |
+------------+-------+------------------------------------------------------+

SWISS-PROT/PROSITE/ENZYME tapes or CD-ROM subscription: datalib at embl.bitnet

EMBL file server email address:      netserv at embl.bitnet
GenBank On-line Service FTP address: genbank.bio.net (or 134.172.1.160)
NCBI FTP address:                    ncbi.nlm.nih.gov (or 130.14.20.1)


=============================================================================
=============================================================================


                        TABLE OF CONTENTS
                Volume 1, Number 1 /  January 1991


 1. What is GePSAN.
 2. SWISS-PROT news.
 3. Cross-references to OMIM in SWISS-PROT and PROSITE.
 4. Biomolecular databases integration: current status.
 5. NCBI, the GenInfo Backbone Database, and the ASN.1 syntax.
 6. PROSITE news.
 7. Updated list of public domain programs which make use of PROSITE.
 8. ENZYME news.
 9. Specialized databases part 1: the P450 database.


=============================================================================
<PAGE>
=============================================================================

Section: 1
Title  : What is GePSAN.

GePSAN is a newsletter that  deals  with aspects of protein sequence analysis
that are relevant to the data bases that are  maintained at the Department of
Medical Biochemistry (DMB) of the University of Geneva, namely:

    SWISS-PROT: An annotated protein sequence data base. A joint project of
                the DMB and of the EMBL Data Library.
    PROSITE   : A dictionary of sites and patterns in proteins.
    ENZYME    : An enzyme nomenclature data base.
    SEQANALREF: A sequence analysis bibliographic reference data base.

This newsletter will also attempt to report new  developments in the field of
protein sequence analysis.

=============================================================================
=============================================================================

Section: 2
Title  : SWISS-PROT news.

1) Release 16
=============

Release 16.0  of SWISS-PROT  contains 18364 sequence entries, comprising
5'986'949 amino  acids abstracted from 17763 references. This represents
an increase of 9% over release 15.   More  than 1400 sequences have been
added since release 15,  the sequence data of  271 existing  entries has
been updated  and the annotations of 3500 entries  have been revised. In
particular we  have used reviews articles to  update the  annotations of
the following groups or families of proteins:

   -  Alpha and beta adrenergic receptors
   -  Arrestins
   -  Chromogranins / secretogranins
   -  CTF/NF-I family
   -  ClpP proteases
   -  ets family
   -  GABA(A) receptors
   -  Gram-positive cocci surface proteins
   -  Hexokinases
   -  Integrins alpha and beta chains
   -  NMePhe pili proteins
   -  p53 proteins
   -  Poly(ADP-ribose) polymerase
   -  Profilins
   -  S-Adenosylmethionine synthetases
   -  Site-specific recombinases
   -  Synaptobrevins
   -  Type-II membrane antigens
   -  UDP-glucoronosyl transferases
   -  Uteroglobin family
   -  LBP / BPI / CETP family

We have finished adding cross-references to human protein sequence entries
which are represented in the latest edition of OMIM  (see the next section
for full details).
<PAGE>
2) Future developments
======================

One question many users of SWISS-PROT ask me is: what is the exact extent of
the overlap between SWISS-PROT and  PIR ?   Up  to  now cross-references (DR
lines) were provided only to entries  in the annotated section of PIR (which
is now called PIR1) and for which we provide a complete overlap. Only a  few
cross-references were provided to entries in the unannotated sections of PIR
(which used  to  be called "NEW",  but  are  now known as PIR2 and PIR3). We
started  in  release 16 to  add cross-references, this task will continue in
release 17  and be completed for release 18.   At  that  point  it  will  be
possible to  users  that  do  not  want  to  scan  two protein data banks to
automatically  extract from PIR2/PIR3 all the sequences that are not present
in SWISS-PROT and to  produce a file  that complement SWISS-PROT.  In a next
issue of this  newsletter  we  will  explain this process in detail and also
describe what exactly are the  differences between SWISS-PROT and PIR.


In release 18 we will  invert  the  order of the information in the OS line.
Currently  we have 'English common name (Latin name)`,  we  will  switch  to
'Latin name (English common name)`. Example:

        OS   HUMAN (HOMO SAPIENS).

   will be changed to:

        OS   HOMO SAPIENS (HUMAN).


We hope to also provide in release 18 cross-references to TFD (the relational
database of transcription factors from David Gosh (NCBI / USA).


3) News concerning SWISS-PROT availability
==========================================

 a) New SWISS-PROT entries and updates to existing entries are now available
    in  between  regular releases  from the EMBL File Server.  They  are not
    provided on a daily basis like new  nucleotide entries, but we intend to
    make at  least two or three sets  of  incremental  updates  between each
    release.

 b) SWISS-PROT is now available  for  download  by FTP from the NCBI server.
    All the files are in the \repository\SWISS-PROT directory.

 c) SWISS-PROT will  also soon be available,  also by FTP,  from the GenBank
    On-line Service (GOS) server

=============================================================================
<PAGE>
=============================================================================

Section: 3
Title  : Cross-references to OMIM in SWISS-PROT and PROSITE.

OMIM is the on-line version of Mendelian Inheritance in Man (MIM), the famous
book from  Victor McKusick [1] which holds clinical data  on a range of human
genetic diseases as well as all known gene loci.  During the last five months
we have implemented cross-references to OMIM both in SWISS-PROT and ENZYME.

   [1]  McKusick Victor A.
        Mendelian Inheritance in Man
        Catalogs of autosomal dominant, autosomal recessive, and X-linked
        phenotypes
        Ninth edition
        Johns Hopkins University Press, Baltimore, (1990).


Practically what has been done in SWISS-PROT is the following:

  1) In each human  protein entry  whose  gene  was  found to be described in
     OMIM, a DR (cross-reference) line was added  that points to the OMIM six
     digits catalog number.

     Example:

     DR   MIM; 261600; NINTH EDITION.

     Currently (in release 16.0 of SWISS-PROT)  there  are  840 human protein
     sequence entries with one or more DR lines that points to OMIM.

     A new document file, called MIMTOSP.TXT, is provided with SWISS-PROT, it
     is a  sorted list  of the MIM catalog entries cross-referenced in SWISS-
     PROT and the corresponding protein sequence entry names.

  2) If the protein is associated with a genetic defect or disease,  this has
     been indicated in the CC lines using the "DISEASE" topic.

     Examples:

     CC   -!- DISEASE: THIS ENZYME IS DEFICIENT IN TWO GENETIC DISEASES: THE
     CC       LESCH-NYHAN SYNDROME, IN WHICH THERE IS NO ENZYME ACTIVITY; AND
     CC       HYPERURICEMIA WITH AN EARLY ONSET OF GOUT, IN WHICH THERE IS
     CC       PARTIAL ENZYME ACTIVITY.

     CC   -!- DISEASE: DEFICIENCY OF THE ENZYME CAUSES PHENYLKETONURIA (PKU),
     CC       THE MOST COMMON INBORN ERROR OF AMINO ACID METABOLISM.

  3) If variants of the sequences are known, they have  been indicated in the
     feature table using the "VARIANT" key.

     Example:

     FT   VARIANT     103    103       S -> R (GOUT MUNICH).


On the following page is an example of SWISS-PROT sequence which contains all
three types of MIM-related enhancements described above.

<PAGE>
ID   CAH2$HUMAN     STANDARD;      PRT;   259 AA.
AC   P00918;
DT   21-JUL-1986  (REL. 01, CREATED)
DT   21-JUL-1986  (REL. 01, LAST SEQUENCE UPDATE)
DT   01-NOV-1990  (REL. 16, LAST ANNOTATION UPDATE)
DE   CARBONIC ANHYDRASE II (EC 4.2.1.1) (CARBONATE DEHYDRATASE II) (GENE
DE   NAME: CA2).
OS   HUMAN (HOMO SAPIENS).
OC   EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA;
OC   EUTHERIA; PRIMATES.
RN   [1] (SEQUENCE FROM N.A.)
RA   MONTGOMERY J.C., VENTA P.J., TASHIAN R.E., HEWETT-EMMETT D.;
RL   NUCLEIC ACIDS RES. 15:4687-4687(1987).
RN   [2] (SEQUENCE FROM N.A.)
RA   MURAKAMI H., MARELICH G.P., GRUBB J.H., KYLE J.W., SLY W.S.;
RL   GENOMICS 1:159-166(1987).
RN   [3] (SEQUENCE)
RA   HENDERSON L.E., HENRIKSSON D., NYMAN P.O.;
RL   J. BIOL. CHEM. 251:5457-5463(1976).
RN   [4] (SEQUENCE)
RA   LIN K.-T.D., DEUTSCH H.F.;
RL   J. BIOL. CHEM. 249:2329-2337(1974).
RN   [5] (SEQUENCE OF 1-76 FROM N.A.)
RA   VENTA P.J., MONTGOMERY J.C., HEWETT-EMMETT D., TASHIAN R.E.;
RL   BIOCHIM. BIOPHYS. ACTA 826:195-201(1985).
RN   [6] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS)
RA   LILJAS A., KANNAN K.K., BERGSTEN P.-C., WAARA I., FRIDBORG K.,
RA   STRANDBERG B., CARLBOM U., JARUP L., LOVGREN S., PETEF M.;
RL   NATURE NEW BIOL. 235:131-137(1972).
RN   [7] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS)
RA   ERIKSSON A.E., JONES T.A., LILJAS A.;
RL   PROTEINS 4:274-282(1988).
RN   [8] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS)
RA   ERIKSSON A.E., KYLSTEN P.M., JONES T.A., LILJAS A.;
RL   PROTEINS 4:283-293(1988).
RN   [9] (JOGJAKARTA VARIANT)
RA   JONES G.L., SOFRO A.S.M., SHAW D.C.;
RL   BIOCHEM. GENET. 20:979-1000(1982).
RN   [10] (MELBOURNE VARIANT)
RA   JONES G.L., SHAW D.C.;
RL   HUM. GENET. 63:392-399(1983).
CC   -!- CATALYTIC ACTIVITY: H(2)CO(3) = CO(2) + H(2)O (REVERSIBLE
CC       HYDRATATION OF CARBON MONOXIDE).
CC   -!- THERE ARE AT LEAST 6 ENZYMATIC FORMS OF CARBONIC ANHYDRASE: CA-I
CC       (OR B), CA-II (OR C), CA-III (OR M), CA-IV, CA-V AND CA-VI.
CC   -!- DISEASE: DEFECTS IN CA2 ARE THE CAUSE OF OSTEOPETROSIS WITH RENAL
CC       TUBULAR ACIDOSIS (MARBLE BRAIN DISEASE).
DR   EMBL; Y00339; HSCA2.
DR   EMBL; X03251; HSCAII.
DR   EMBL; J03037; HSCAIIA.
DR   PIR; A01141; CRHU2.
DR   PIR; A23202; A23202.
DR   PIR; A27175; A27175.
DR   PDB; 1CA2; 15-JAN-90.
DR   PDB; 2CA2; 15-APR-90.
DR   PDB; 3CA2; 15-APR-90.
DR   MIM; 259730; NINTH EDITION.
KW   LYASE; ACETYLATION; ZINC; 3D-STRUCTURE.
<PAGE>
FT   INIT_MET      0      0
FT   MOD_RES       1      1       ACETYLATION.
FT   ACT_SITE     63     63
FT   ACT_SITE     66     66
FT   METAL        93     93       ZINC, CATALYTIC.
FT   METAL        95     95       ZINC, CATALYTIC.
FT   METAL       118    118       ZINC, CATALYTIC.
FT   ACT_SITE    126    126
FT   ACT_SITE    196    198
FT   VARIANT      17     17       K -> E (JOGJAKARTA).
FT   VARIANT     235    235       P -> H (MELBOURNE).
FT   VARIANT     251    251       N -> D.
SQ   SEQUENCE   259 AA;  29115 MW;  365693 CN;
     SHHWGYGKHN GPEHWHKDFP IAKGERQSPV DIDTHTAKYD PSLKPLSVSY DQATSLRILN
     NGHAFNVEFD DSQDKAVLKG GPLDGTYRLI QFHFHWGSLD GQGSEHTVDK KKYAAELHLV
     HWNTKYGDFG KAVQQPDGLA VLGIFLKVGS AKPGLQKVVD VLDSIKTKGK SADFTNFDPR
     GLLPESLDYW TYPGSLTTPP LLECVTWIVL KEPISVSSEQ VLKFRKLNFN GEGEPEELMV
     DNWRPAQPLK NRQIKASFK
//



In ENZYME we have added a "DI" (DIsease) line for all enzymes which are known
to be associated with a genetic defect. As shown in the following example:

DI   PHENYLKETONURIA; MIM:261600.

Here is an example of an ENZYME entry with a DI line:

ID   4.2.1.1
DE   CARBONIC DEHYDRATASE.
AN   CARBONIC ANHYDRASE.
CA   H(2)CO(3) = CO(2) + H(2)O.
CF   ZINC.
DI   OSTEOPETROSIS-RENAL TUBULAR ACIDOSIS SYNDROME; MIM:259730.
DR   P00917, CAH1$HORSE;  P00915, CAH1$HUMAN;  P00916, CAH1$MACMU;
DR   P13634, CAH1$MOUSE;  P07452, CAH1$RABIT;  P00921, CAH2$BOVIN;
DR   P07630, CAH2$CHICK;  P00918, CAH2$HUMAN;  P00920, CAH2$MOUSE;
DR   P00919, CAH2$RABIT;  P00922, CAH2$SHEEP;  P07450, CAH3$HORSE;
DR   P07451, CAH3$HUMAN;  P16015, CAH3$MOUSE;  P14141, CAH3$RAT  ;
DR   P18915, CAH6$BOVIN;  P18761, CAH6$MOUSE;  P08060, CAH6$SHEEP;
DR   P17067, CAHC$PEA  ;  P16016, CAHC$SPIOL;
//

=============================================================================
<PAGE>
=============================================================================

Section: 4
Title  : Biomolecular databases integration: current status.

In the last  six months  there  has been a number of developments relative to
the integration of biomolecular databases:

   1) The EMBL Nucleotide Sequence  Database is now fully cross-referenced to
      SWISS-PROT.
   2) SWISS-PROT and ENZYME are now cross-referenced to MIM (see section 3 of
      this letter).
   3) Cross-references have  been added  in SWISS-PROT to REBASE, the type II
      restriction enzymes data base.
   4) The new release (9012) of the  Drosophila Genetic Maps (DMAP)  database
      from Michael Ashburner (Cambridge / U.K.)  is  now  cross-referenced to
      EMBL/GenBank, SWISS-PROT and PIR.
   5) The new release (2.0) of the  Transcription Factors Database (TFD) from
      David Gosh (NCBI / USA) is now cross-referenced to EMBL/GenBank, SWISS-
      PROT and PIR.

The current status of the relationships between the biomolecular databases is
shown in the following schematic:

                                                        *********************
                        ***********************  <----- * EPD   [Promoters] *
                        *  EMBL Nucleotide    *         *********************
*****************       *  Sequence Data      *
* DMAP          * ----> *  Library            *         *********************
* [Drosophila   *       ***********************  <----- * ECD   [E.coli]    *
* Genetic maps] *                ^  |       ^           *********************
***************** ------- +      |  |       |
                          |      |  |       |           *********************
Version: Jan. 10          |      |  |       +---------- * TFD [Trans.fact.] *
         1991             |      |  |       |           *********************
                          |      |  |       |
*****************         v      |  v       v           *********************
* PROSITE       * <---- ***********************  <----- * ENZYME [Nomencl.] *
* [Patterns]    * ----> *  SWISS-PROT         *         *********************
*****************       *  Protein Sequence   *             |
                        *  Data Bank          *             |
*****************       ***********************             v
* REBASE        *         |       |        |            *********************
* [Restriction  * <-------+       |        +--------->  * OMIM   [Diseases] *
* enzymes]      *                 |                     *********************
*****************                 v
                        ***********************
                        * PDB [3D structures] *
                        ***********************

We believe that it is know possible to software  developers to start to build
hypertext oriented software packages that can navigate  between the different
biomolecular data banks.

=============================================================================
<PAGE>
=============================================================================

Section: 5.
Title  : NCBI, the GenInfo Backbone Database, and the ASN.1 syntax.

The National Center for Biotechnology  Information (NCBI),  at  the  National
Library of Medicine (NLM) (Washington D.C) is involved  in the development of
a  database  building  system  that  addresses  the  problems  of  integrated
information as well as  currency and accessibility.  One of their projects is
the production of  an  integrated nucleic acid and protein sequence database,
which is called  the  GenInfo Backbone Database ('Backbone'), that accurately
reflects the  journal literature.  The  Backbone  will  include  all  protein
sequences of at least three amino acids  and nucleotide sequences of at least
nine bases. The annotations provided by the Backbone are minimal; it is meant
to reflect the  data  presented by  the  scientific  literature;  but  not to
model biological reality. The Backbone is a database  which  will, hopefully,
help to build  and  maintain,  fully annotated databases, such as SWISS-PROT,
PROSITE or ENZYME.

As the Backbone is a database on which to build other databases, the NCBI had
to select a  reliable data  exchange  standard  to facilitate the exchange of
information between  biomolecular  databases.  The  standard  which has  been
chosen is called ASN.1  (Abstract Syntax Notation 1), also known as ISO 8824.
ASN.1 is specifically designed to allow a formal precise definition  of  what
is exchanged  between two applications without specifying  how  it is  to  be
represented or used by either application.

The NCBI is  also committed in developing and distributing a software toolbox
that will help software and  database  developers  to interact with the ASN.1
notation and the Backbone.

As a user of  biomolecular  database  you  will  probably  not  have to  deal
directly with the  Backbone  or  with the ASN.1 format, except if you want to
develop a new specialized  biomolecular  database, but you should be aware of
the existence of  such projects and of the many positive consequences for the
scientific  community of such an endeavor, if it is successful. As we believe
in the scientific validity and relevance of these projects we have decided to
participate.  Our participation will at least take two forms: we will provide
SWISS-PROT, ENZYME, and PROSITE in the ASN.1 syntax (the existing format will
not be discontinued) and we  will  start  to  use the Backbone as a source of
primary (literature) data for SWISS-PROT.

As a first step we have produced an ASN.1  specification  for the ENZYME data
bank and will soon start to distribute an ASN.1 version of that database (see
section 8 of this newsletter).

=============================================================================
<PAGE>
=============================================================================

Section: 6
Title  : PROSITE news.

1) Release 6.0
==============

Release 6.0 of PROSITE contains 375 documentation  chapters that describe 433
different patterns. Since release 5.1 77 new chapters have been added and 131
have been updated.  Release 6.0 is  fully cross-referenced with release 16 of
SWISS-PROT.

There have been no changes in the format of the files of the data base.


2) Future developments
======================

- Release 6.10 will come out in March 1991  with  release 17  of SWISS-PROT,
  like it was the case  for release 5.10, it will not be a "real" update, it
  will only update pointers  to  SWISS-PROT  for sequence entries whose name
  have been modified from release 16 to 17.

- Release 7.0 will come out  with  release 18  of SWISS-PROT in early summer
  1991. There will be lots of new pattern entries.   We can already announce
  the following ones (as they are either ready or being written):

  - 6-phosphogluconate dehydrogenase signature
  - Catalase signatures
  - Peroxidases signature
  - Acyltransferases ChoActase / COT / CPT-II family signatures
  - Chalcone synthase and resveratrol synthase signature
  - Glutamine amidotransferases class-I active site
  - Glutamine amidotransferases class-II active site
  - Polyprenyl synthetases signature
  - Eukaryotic RNA polymerases 30 to 40 Kd subunits signature
  - Prokaryotic carbohydrate kinases signature
  - DNA polymerase family A signature
  - Clostridium cellulases repeated domain signature
  - ATP synthase a subunit signature
  - Aconitase signature
  - Guanylate cyclases signature
  - FKBP peptidyl-prolyl cis-trans isomerase signatures
  - Sodium symporters signatures
  - Natriuretic peptides receptors signature
  - PF4/IL-8 cytokines signatures
  - Myotoxins signature
  - Pathogenesis-related proteins BetvI family signature


  We have a large (and growing) lists of new patterns to add. Some of those
  that are currently in the `pipeline' are listed below.

<PAGE>

  - SH2 and SH3 domains
  - Animal lectin domain
  - Bacterial sensory transduction proteins signatures
  - Alpha-macroglobulin family signature
  - Clusterins signature
  - Plants 2S seed storage proteins signature
  - TNF/NGF receptors family signature
  - Small heat shock proteins (HSP20).

  But this is far from being a complete list !

  We  have not yet received any matrices from any sources so the introduction
  of matrices in PROSITE is probably not for release 7.0.


3) On-line experts
==================

We have added, in the PROSITE  documentation  file  (PROSITE.DOC),  the email
addresses of experts specific to a specific field.This information is present
in the following format:

-Expert(s) to contact by email: Name X.Y.
                                name at location.network

As you can see from the following  table our current list of experts is still
very small, so I would like  again to call for volunteers (the `requirements'
to be fulfilled to become an on-line expert are  listed  at  the  end of this
section), please don't be shy !!!


Field of expertise            Name                 Email address
---------------------------   ------------------   --------------------------
Alcohol dehydrogenases        Bengt P.             bengt at medfys.ki.se
Aldehyde dehydrogenases       Bengt P.             bengt at medfys.ki.se
Apolipoproteins               Boguski M.S.         boguski at ncbi.nlm.nih.gov
Arrestins                     Kolakowski L.F. Jr.  lfk at athena.mit.edu
Bacteriophage P4              Halling C.           chh9 at midway.uchicago.edu
Beta-lactamases               Brannigan J.         bafm1 at cluster.sussex.ac.uk
Chitinases                    Henrissat B.         cermav at frgren81.bitnet
CTF/NF-I                      Mermod N.            nmermod at clsuni51.bitnet
EF-hand calcium-binding       Cox J.A.             cox at cgeuge52.bitnet
                              Kretsinger R.H.      rhk5i at virginia.bitnet
Glucanases                    Henrissat B.         cermav at frgren81.bitnet
                              Beguin P.            phycel at pasteur.bitnet
Eryf1-type zinc-fingers       Boguski M.S.         boguski at ncbi.nlm.nih.gov
G-protein coupled receptors   Chollet A.           chollet at clients.switch.ch
Inorganic pyrophosphatases    Kolakowski L.F. Jr.  lfk at athena.mit.edu
Integrases                    Roy P.H.             2020000 at lavalvx1.bitnet
Protein kinases               Hanks S.             hanks at vuctrvax
Restriction-modification      Bickle T.            bickle at urz.unibas.ch
                              Roberts R.J.         roberts at cshl.org
Ring-cleavage dioxygenases    Harayama S.          harayama at cgecmu51.bitnet
Subtilisin family proteases   Brannigan J.         bafm1 at cluster.sussex.ac.uk
Thiol proteases               Turks B.             turk at ijs.ac.mail.yu
Thiol proteases inhibitors    Turks B.             turk at ijs.ac.mail.yu
TPR repeats                   Boguski M.S.         boguski at ncbi.nlm.nih.gov
Transit peptides              von Heijne G.        gunnar at cbts.sunet.se
Type-II membrane antigens     Levy S.              levy at cellbio.stanford.edu

<PAGE>

Requirements to fulfill to become an on-line expert
===================================================

An expert should be a scientist working with  specific famili(es) of proteins
(or specific domains) and which would:

  a) Review the protein sequences in SWISS-PROT and the patterns/matrices
     in PROSITE relevant to their field of research.
  b) Agree to be contacted by  people  that have obtained new sequence(s)
     which seem to belong to "their" familie(s) of proteins.
  c) Have access to electronic mail  and be willing to use it to send and
     receive data.

If you are willing to be part of  this  scheme please contact me (but, please
by email exclusively !)

=============================================================================
<PAGE>
=============================================================================

Section: 7
Title  : Updated list of public domain programs which make use of PROSITE.

I have  been  made  aware  of the development of the following public domain
software packages that make use of PROSITE.

1) MacPattern
=============

Apple MacIntosh application. Offers features like a pattern list for pattern
selection, direct access to documentation in PROSITE,  pattern sets, pattern
entering by keyboard, etc. It can  read SWISS-PROT, PIR, DNA Strider, DNAid,
Pearson and plain ASCII sequences. MacPattern can also use any other pattern
database adhering to the PROSITE syntax, even DNA patterns. No special hard-
or software is required.

Contact  : Rainer Fuchs
           fuchs at embl.bitnet
Version  : 1.1
Available: On the EMBL File Server: MAC_SOFTWARE:MACPATTERN.HQX


2) Scrutineer
=============

SCRUTINEER is a sophisticated pattern searching and database analysis program
written by Peter Sibbald at EMBL.  The program is written in Pascal and comes
complete with source, manual and on-line help. SCRUTINEER is described in the
following reference:

  Sibbald P.R., Argos P.
  Scrutineer: a computer program that flexibly seeks and describes motifs
  and profiles in protein sequence databases."
  CABIOS 6:279-288(1990).

SCRUTINEER works on VAXes, and apparently can be made to runs on UNIX systems.
The November 1990 version of SCRUTINEER add, among  other  enhancements,  the
possibility  of  searching  for all  of  the  PROSITE patterns in one or more
protein sequences.

Contact  : Peter Sibbald
           sibbald at embl.bitnet
Version  : Nov. 1990
Available: On the EMBL File Server: VAX_SOFTWARE:MACPATTERN.UAA


3) ProSearch
============

A software, written mostly in AWK, that runs under Unix and that will search
a protein sequence for all of the PROSITE patterns.  Note:  it will also run
under  MS-DOS and VMS if you  have  access  to a public domain or commercial
version of AWK on such systems.

Contact  : Lee F. Kolakowski
           lfk at athena.mit.edu
Version  : 1.1
Available: On the EMBL File Server: UNIX_SOFTWARE:PROSEARCH.UUE

<PAGE>

4) CREGEX
=========

CREGEX creates, from the native PROSITE data bank, the file containing valid
AWK regular expressions that can then be used with the ProSearch program.

Contact  : Jack Leunissen
           jackl at caos.caos.kun.nl
Version  : 1.1
Available: On the EMBL File Server: UNIX_SOFTWARE:CREGEX.C


5) PROINDEX
===========

VAX-Fortran program to create an index built  from the information stored in
the DE lines of the PROSITE.DAT file.

Contact  : Steve Clark
           clark at utoroci.bitnet or clark at mshri.utoronto.ca
Available: On the EMBL File Server: VAX_SOFTWARE:PROINDEX.UUE


6) PROSITEC
===========

VAX-Pascal program to convert the PROSITE files into GCG FIND-format.

Contact  : Kay Hofmann
           akc01 at dk0rrzk1.bitnet
Version  : 1.1
Available: On the EMBL File Server: VAX_SOFTWARE:PROSITEC.UUE


7) ProDoc
=========

VAX program  for  the GCG package to  display  documentation  entries in the
PROSITE.DOC file, given a documentation entry number.

Contact  : Anne Marie Quinn
           quinn at salk.bitnet
Available: By anonymous ftp on: SALK-SC2.SDSC.EDU


8) BISANCE system
=================

A program to interrogate PROSITE is available on-line  on the BISANCE system
of the French CITI2 biocomputing resource.

Contact:  Phillipe Dessen
          dessen at frciti51.bitnet

=============================================================================
<PAGE>
=============================================================================

Section: 8
Title  : ENZYME news.

There are few things we want to point out about release 3.0 of ENZYME as well
as about future releases.

1) Completeness
===============

Currently the data bank contains full information about the recommended name,
alternative name(s), catalytic activity, cofactor(s) of ALL 3071 enzymes. The
ENZYME data bank can now be considered as fully operational.


2) The DI line
==============

As described in section 3  of  this  letter, a new line type 'DI` (= DIsease)
was  implemented (starting with release 2.0) so as to add cross-references to
MIM (Mendelian Inheritance in Man).

The precise format of the DI line is:

 DI   DISEASE_NAME; MIM:NUMBER.

Where 'NUMBER' is the MIM catalog number of the disease (or phenotype).

Examples:

 DI   XANTHINURIA; MIM:278300.
 DI   PHENYLKETONURIA; MIM:261600.


3) Future releases
==================

Until new enzyme nomenclature data is  published  we only  plan to update the
SWISS-PROT pointers  at  each  release of  the  protein  sequence  data bank,
correct eventual errors, and complete the information concerning synonyms and
cofactors using the literature.

4) An ASN.1 version of ENZYME
=============================

We will  soon start to  distribute  a  version of ENZYME in the ASN.1  syntax
which has been selected by the NCBI to facilitate the exchange of information
between biomolecular databases (see section 5 of this newsletter).

<PAGE>

We will continue to distribute ENZYME in  its  current format, but there will
be two additional files:

ECSPEC.ASN:  ENZYME database ASN.1 specification.  This  file  describes  the
             syntax used by the ASN.1 version of the ENZYME data base.
ENZYME.ASN:  ENZYME database in ASN.1 notation.

We will not list here the full ENZYME database ASN.1 specification, but just
to give you a "flavor" of ASN.1, an example of an entry in both the original
and the ASN.1 format:

ID   1.4.3.14
DE   L-LYSINE OXIDASE.
AN   LYSYL OXIDASE.
CA   L-LYSINE + O(2) + H(2)O = 2-OXO-6-AMINOHEXANOATE + NH(3) + H(2)O(2).
CF   COPPER; PQQ.
CC   -!- ALSO ACTS, MORE SLOWLY, ON L-ORNITHINE, L-PHENYLALANINE, L-ARGININE,
CC       AND L-HISTIDINE.
DI   CUTIS LAXA (EHLERS-DANLOS SYNDROME IX); MIM:304150.
DI   LYSINE INTOLERANCE; MIM:247900.
DR   P16636, LYOX$RAT  ;
//

Is represented in the ASN.1 notation, following the specifications that we
have developed for it, by:

Enzyme-activity ::= {
 ecnumb {
        class 1 ,
        subclass 4 ,
        sub-subclass 3 ,
        serial-numb 14 } ,
 status data
 {
 name "L-LYSINE OXIDASE." ,
 synonyms { "LYSYL OXIDASE." } ,
 reaction reac-equa {
     left  { { stoich "1" , compound { chem-name "L-LYSINE" } } ,
             { stoich "1" , compound { chem-name "O(2)" } } ,
             { stoich "1" , compound { chem-name "H(2)O" } } } ,
     right { { stoich "1" , compound { chem-name "2-OXO-6-AMINOHEXANOATE" } } ,
             { stoich "1" , compound { chem-name "NH(3)" } } ,
             { stoich "1" , compound { chem-name "H(2)O(2)" } } } } ,
 cofactors { { chem-name "COPPER" } ,
             { chem-name "PQQ" } } ,
 comments { "ALSO ACTS, MORE SLOWLY, ON L-ORNITHINE, L-PHENYLALANINE,
             L-ARGININE, AND L-HISTIDINE." } ,
 disease { { disease-name "CUTIS LAXA (EHLERS-DANLOS SYNDROME IX)",
             MIM-numb 30415 } ,
           { disease-name "LYSINE INTOLERANCE", MIM-numb 24790 } } ,
 x-ref { { db-name "SPROT", ident-1 "P16636", ident-2 "LYOX$RAT" } }
 }
}

=============================================================================
<PAGE>
=============================================================================

Section: 9
Title  : Specialized databases: the P450 database

We will use this  section  to  describe  specialized  biomolecular  databases
which,  in our opinion, are important, yet not very well known. In this first
issue we briefly describe:

********************************
* The cytochrome P450 database *
********************************

Produced by the  group  of  Alexander Archakov at the Institute of Biological
and medical Chemistry of the USSR Academy of Medical Sciences in Moscow, this
database  contains  a  wealth  of  information  on  cytochromes P450:  names,
sequences,  genome  location,  inducers,  substrates,  etc.     The  database
supplements the book of A.I. Archakov  and G.I. Bachmanova: "Cytochrome P-450
and active oxygen", published by Taylor and Francis Ltd in 1990.

The database  is distributed, for MS/PC-DOS based systems, in two  forms: the
first one, called DBCPD, runs under  dBase III  plus, the second one,  called
RBCPD, runs under Rbase. Both forms are menu-driven and are very easy to use.

The group of Archakov can be contacted at the following address:

   Prof. A.I. Archakov
   Institute of Biological and Medical Chemistry
   USSR Academy of Medical Sciences
   Pogodinskaya str. 10
   119838 Moscow
   USSR

   Fax: (+7) (095) 938 21 23
        (+7) (095) 245 08 57

=============================================================================
====== End of GePSAN Newsletter Volume 1 - Number 1 =========================



More information about the Proteins mailing list