Release notes of SWISS-PROT 27 / PROSITE 11

Amos Bairoch BAIROCH at CMU.UNIGE.CH
Thu Nov 18 16:52:53 EST 1993


                    SWISS-PROT RELEASE 27.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 27.0  of SWISS-PROT  contains 33329 sequence entries, comprising
   11'484'420 amino acids abstracted from 32314 references. This represents
   an increase  of 5.6% over release 26. The recent growth of the data bank
   is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091
   27.0       10/93              33329            11 484 420

   1.2  Source of data

   Release 27.0  has been  updated using protein sequence data from release
   37.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 36.0 of the
   EMBL Nucleotide Sequence Database.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):            4553
   Entries with pointer(s) to only EMBL entri(es):           4557
   Entries with pointer(s) to both EMBL and PIR entri(es):  23557
   Entries with no pointers lines:                            662




<PAGE>




      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 26


   2.1  Sequences and annotations

   About 1532 sequences have been added since release 26, the sequence data
   of 213  existing entries  has been  updated and  the annotations of 3000
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  Aspartate and glutamate racemases
   -  Bacteriophage T4 proteins
   -  Band 3 anion proteins
   -  Beta amylases
   -  Cysteine synthases
   -  Deoxyribonuclease I
   -  Epenymins
   -  Epimorphin family
   -  GTP1/Obg family
   -  Fork head domain proteins
   -  G-linked receptors family 2
   -  Glutamate 5-kinase
   -  HIT family
   -  Lysyl oxidases
   -  mutT domain proteins
   -  Nitrilases / cyanide hydratase
   -  Peripherin / rom-1
   -  Phosphatidylinositol 3-kinases
   -  Pollen proteins Ole e I family
   -  Prokaryotic transglycosylases
   -  Protein prenyltransferases alpha subunit repeat
   -  Renal dipeptidases
   -  Trehalases



   2.2  A special emphasis on four "model" organisms

   We have selected four organisms that are the target of genome sequencing
   and/or mapping projects and for which we intend to:

   -  Be as  complete as  possible. All sequences available at a given time
      should be  immediatly included  in  SWISS-PROT.  This  also  includes
      sequence corrections and updates.
   -  Provide a high level of annotations.
   -  Cross-references to specialized database(s) that contain, among other
      data, some  genetic information  about the  genes that code for these
      proteins.
   -  Provide specific indices or documents.






<PAGE>



   The four organisms selected are:

   o  Caenorhabditis elegans (worm)
   o  Drosophila melanogaster (fly)
   o  Escherichia coli
   o  Saccharomyces cerevisiae (yeast)

   Such a  special effort  has been  going on  for more  than  a  year  for
   Escherichia coli  (thanks to  a very  fruitful  collaboration  with  Ken
   Rudd), it  has started  in this  release for  yeast: about 300 new yeast
   sequences were  entered; about  1500 entries  were reannotated and a new
   document (YEAST.TXT)  is provided  that list yeast entries in SWISS-PROT
   classified by  gene name  and synonym(s).  The next  release will target
   C.elegans thanks  to a collaboration with the group at the Sanger Genome
   Center in  Hinxton (UK).  The Drosophila  "project" should  start a  bit
   later.

   Organism          Database                 Index file
                     X-referenced             provided
   --------------    ----------------------   --------------
   C.elegans         WormPep                  Next release
   D.melanogaster    Flybase                  In preparation
   E.coli            EcoGene                  ECOLI.TXT
   S.cerevisiae      LISTA (in preparation)   YEAST.TXT


   2.3  The Expasy World-Wide Web server

   The recent months have seen a tremendous increase in the availability of
   software tools  and applications  that allow  to efficiently make use of
   the varied  resources which  are part  of the Internet network. Three of
   these  `Network   Information  Tools'  (NIR)  are  widely  used  by  the
   biological community:  WAIS (Wide  Area Information Server), Gopher and,
   more recently,  the World-Wide  Web (WWW). As many organizations provide
   WAIS or  Gopher servers  that offer access to SWISS-PROT and PROSITE, we
   felt that  there was  no need to set up ourselves such a service. But no
   such server was yet available for WWW.

   The World-Wide Web (WWW), which originated at CERN, is a powerful global
   information  system   merging  networked   information   retrieval   and
   hypertext. It  gives access, using hypertext links, to the documents and
   information contained  in all the existing WWW servers around the world,
   as well  as to  the data  obtainable through other information retrieval
   systems like WAIS, Gopher, X500, etc. To access a WWW server, one has to
   run on a local computer a client program (a WWW browser), which displays
   hypertext documents.  The user  can then either request a keyword search
   or jump  to another  document by following a hypertext link. WWW has the
   outstanding advantage  of extending  the hypertext  model to  the  whole
   world (by allowing hypertext jumps to documents anywhere on the internet
   network) and  by being  device and  user-interface independent (browsers
   exist for  a variety  of computers  and user-interfaces,  including Unix
   workstations  running  XWindows,  MacIntoshes  and  PCs  with  Microsoft
   Windows).




<PAGE>



   A WWW  server has  been set  up by  Ron Appel  from the  group of  Denis
   Hochstrasser at  the Faculty of Medicine of the University of  Geneva on
   the ExPASy  molecular biology  server. It allows access, using the user-
   friendly hypertext  model, to  the SWISS-PROT and SWISS-2DPAGE databases
   and, through  any SWISS-PROT  protein sequence entry, to other databases
   such as  EMBL, PROSITE,  REBASE, Flybase,  PDB and OMIM. Using a browser
   which is  able to  display images  one can  also remotely access 2D gels
   image data from SWISS-2DPAGE.

   A WWW  server can  be accessed  on  the  internet  through  its  Uniform
   Resource Locator  (URL), the addressing system defined by the WWW model.
   The URL for the ExPASy molecular biology WWW server is:

             http://expasy.hcuge.ch/

   or

             http://129.195.254.61/

   To access a WWW server, you need to run a browser (or client) program on
   your local computer. Browsers exist for a variety of machines and may be
   obtained by  anonymous ftp. Here is a selected list (taken from the CERN
   WWW server)  of currently  available browsers  and the  ftp address from
   which they can be retrieved:

   NCSA Mosaic    a very  flexible and  powerful browser  with a  graphical
                  user interface. Available for Unix boxes using X11/Motif;
                  for Mc  Intoshes and  for Microsoft  Windows.  FTP  site:
                  ftp.ncsa.uiuc.edu (in /Web/xmosaic).

   lynx           a full screen browser for vt100s using full screen, arrow
                  keys, highlighting,  etc. FTP site: ftp2.cc.ukans.edu (in
                  /pub/lynx).

   www            a basic  line mode  browser giving access to WWW from any
                  dumb terminal. FTP site: info.cern.ch (in /pub/www).



   To access  all the  data available  from SWISS-2DPAGE,  the user's local
   computer needs  to run  an image  viewing program.  For most browsers on
   Unix workstations  the default  program is  xv, a  shareware application
   developed by John Bradley at University of Pennsylvania. The program can
   be found by ftp at export.lcs.mit.edu (in /contrib).

   For more  information on  the ExPASy  WWW server, please contact Dr. Ron
   Appel:

        Email: appel at cih.hcuge.ch
        Tel: +41-22-372 62 64
        Fax: +41-22-372 61 98






<PAGE>



   2.4  Changes in the DR line

   We have  added cross-references  to the  WormPep collection of candidate
   protein translations  from the  Caenorhabditis elegans genome sequencing
   project (see  section 2.2  of these  notes). These  cross-references are
   present in the DR lines:

   Data bank identifier: WORMPEP
   Primary identifier  : Cosmid-derived name  given to that protein by the
                         C.elegans genome sequencing project.  In  general
                         this name will not change.
   Secondary identifier: Number  attributed   by  the   C.elegans   genome
                         sequencing project  to that protein.  This number
                         will change when the sequence is updated.
   Example             : DR   WORMPEP; ZK637.7; CE00437.


   2.5  Weekly updates of SWISS-PROT

   Since release 24, we provide weekly updates of SWISS-PROT. These updates
   are available by anonymous FTP. Three files are updated every week:

   new_seq.dat    Contains all the new entries since the last full release.
   upd_seq.dat    Contains the entries for which the sequence data has been
                  updated since the last release.
   upd_ann.dat    Contains the  entries for  which one  or more  annotation
                  fields have been updated since  the last release.

   Currently these  files are  available on  the  following  anonymous  ftp
   servers:

   Organism       EMBL ftp server
   Address        ftp.embl-heidelberg.de (or 192.54.41.33)
   Directory      /pub/databases/swissprot/new

   Organism       ExPASy (Geneva University Expert Protein Analysis System)
   Address        expasy.hcuge.ch  (or 129.195.254.61)
   Directory      /databases/swiss-prot/updates

   Organism       National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov (or 130.14.20.1)
   Directory      /repository/swiss-prot/updates

   !! Important notes !!!

   Although we  try to  follow a  regular schedule,  we do  not promise  to
   update these  files every  week. In some cases two weeks will elapse in-
   between two updates.

   Due to  the current  mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free. Also,
   for the  same reason,  new  entries  do  not  contain  an  OC  (Organism
   Classification) line.




<PAGE>





                            3. ENZYME AND PROSITE

   3.1  The ENZYME data bank

   Release 14.0  of the  ENZYME data bank is distributed with release 27 of
   SWISS-PROT. ENZYME  release 14.0  contains information  relative to 3489
   enzymes.



   3.2  The PROSITE data bank

   3.2.1  What's new in release 11.0

   Release 11.0  of the PROSITE data bank is distributed with release 27 of
   SWISS-PROT.  Release  11.0  contains  715  documentation  chapters  that
   describes 926  different patterns.  Since  the  last  major  release  of
   PROSITE (release  10.00 of  December 1992),  80 new  chapters have  been
   added and  306 chapters  have been  updated. The new chapters are listed
   below:

   -  Protein splicing signature
   -  CAP-Gly domain signature
   -  MAM domain signature
   -  Prokaryotic transcription elongation factors signatures
   -  MCM2/3/5 family signature
   -  XPGC protein signatures
   -  Bacterial regulatory proteins, arsR family signature
   -  Bacterial regulatory proteins, deoR family signature
   -  Dps protein family signatures
   -  Ribosomal protein L27 signature
   -  Ribosomal protein L36 signature
   -  mutT domain signature
   -  3-hydroxyisobutyrate dehydrogenase signature
   -  Dihydroorotate dehydrogenase signatures
   -  Alanine dehydrogenase and pyridine nucleotide transhydrogenase
   -  Lysyl oxidase putative copper-binding region signature
   -  6-hydroxy-D-nicotine oxidase and reticuline oxidase FAD-binding
   -  Cytochrome c oxidase subunit VB, zinc binding region signature
   -  Indoleamine 2,3-dioxygenase signatures
   -  Glycine radical signature
   -  Uroporphyrin-III C-methyltransferase signatures
   -  Protein prenyltransferases alpha subunit repeat signature
   -  Phosphatidylinositol 3-kinase signatures
   -  Glutamate 5-kinase signature
   -  Guanylate kinase signature
   -  ADP-glucose pyrophosphorylase signatures
   -  2'-5'-oligoadenylate synthetases signatures
   -  Deoxyribonuclease I signatures
   -  Glucoamylase active site region signature





<PAGE>




   -  Trehalase signatures
   -  Glycosyl hydrolases family 8 signature
   -  Prokaryotic transglycosylases signature
   -  Renal dipeptidase active site
   -  Serine proteases, ompT family signatures
   -  Proteasome B-type subunits signature
   -  Signal peptidases II signature
   -  Cytidine & deoxycytidylate deaminases zinc-binding region signature
   -  GTP cyclohydrolase I signatures
   -  Nitrilases / cyanide hydratase signatures
   -  Orn/DAP/Arg decarboxylases family 2 signatures
   -  Uroporphyrinogen decarboxylase signatures
   -  Alpha-isopropylmalate and homocitrate synthases signatures
   -  Beta-eliminating lyases pyridoxal-phosphate attachment site
   -  Dihydroxy-acid and 6-phosphogluconate dehydratases signatures
   -  Prephenate dehydratase signatures
   -  Cysteine synthase pyridoxal-phosphate attachment site
   -  Cys/Met metabolism enzymes pyridoxal-phosphate attachment site
   -  Cytochrome c and c1 heme lyases signatures
   -  Aspartate and glutamate racemases signatures
   -  Mandelate racemase / muconate lactonizing enzyme family signatures
   -  Phosphoglucomutase and phosphomannomutase phosphoserine signature
   -  D-alanine--D-alanine ligase signatures
   -  Carbamoyl-phosphate synthase subdomain signatures
   -  Nickel-dependent hydrogenases b-type cytochrome subunit signatures
   -  Adrenodoxin family, iron-sulfur binding region signature
   -  ABC-2 type transport system integral membrane proteins signature
   -  Acyl-CoA-binding protein signature
   -  LacY family proton/sugar symporters signatures
   -  Sodium:alanine symporter family signature
   -  Sodium:galactoside symporter family signature
   -  Osteopontin signature
   -  Peripherin / rom-1 signature
   -  Interleukins -4 and -13 signature
   -  Erythropoietin signature
   -  Galanin signature
   -  Chaperonins clpA/B signatures
   -  Bacterial type II secretion system protein D signature
   -  Bacterial type II secretion system protein F signature
   -  MARCKS family signatures
   -  Elongation factor 1 beta/beta'/delta chain signatures
   -  Eukaryotic initiation factor 4E signature
   -  Calsequestrin signatures
   -  GTP1/OBG family signature
   -  HIT family signature
   -  Ependymins signatures
   -  Epimorphin family signature
   -  Yeast PIR proteins repeats signature
   -  Oleosins signature
   -  Pollen proteins Ole e I family signature
   -  Hypothetical YCR59c/yigZ family signature





<PAGE>




   3.2.2  Future developments

   Starting with  the next  major releases (12.0 of May 1994), PROSITE will
   be extended  to include  weight matrices (also known as profiles). There
   are a  number of  protein families  as well  as functional or structural
   domains that  cannot be  detected using  patterns due  to their  extreme
   sequence divergence.  Typical examples  of important  functional domains
   which are  weakly conserved  are the immunoglobulin domains, the SH2 and
   SH3 domains,  or the  fibronectin type III domain. In such domains there
   are only  a few sequence positions which are well conserved. Any attempt
   of building  a consensus  pattern for  such regions  will either fail to
   pick up  a significant  proportion of the protein sequences that contain
   such region  (false negative)  or will pick up too many proteins that do
   not contain  the region  (false positive). The use of technique based on
   weight matrices  or profiles  allows the  detection of  such proteins or
   domains. Dr.  Philipp  Bucher  at  ISREC  in  Lausanne  and  myself  are
   collaborating to  include such  methods into PROSITE. This collaboration
   also includes  other participants such as Roland Luethy (AMGEN), Michael
   Gribskov (SDSC)  and Steve  Altschul (NCBI).  If you  are interested  in
   participating in this project please contact Philipp Bucher at:

                          pbucher at isrec-sun1.unil.ch

   We will  include in  the next  release note of SWISS-PROT (Release 28 of
   February 1994)  a brief  description of the PROSITE syntax extension for
   profiles. The full description will be available in the User's Manual of
   release 11.1 of PROSITE (February 1994).

   Important notice  for software  developers: the  integration of profiles
   into PROSITE  will not  "break" the current format. The profiles entries
   in the  PROFILE.DAT file  will be  tagged with the token "MATRIX" on the
   "ID" line  (currently, only  "PATTERN" and "RULE" are used as tokens); a
   new line-type "MA" will be used in these entries to store all the weight
   matrices specific  parameters. The  format of  the PROFILE.DOC file will
   not be changed.



                            4. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about  annotations to be updated, if, for example, the function
   of a protein has been clarified or if new post-translational information
   has become available.










<PAGE>



                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.66   Gln (Q) 4.02   Leu (L) 9.20   Ser (S) 7.09
   Arg (R) 5.23   Glu (E) 6.26   Lys (K) 5.81   Thr (T) 5.83
   Asn (N) 4.45   Gly (G) 7.04   Met (M) 2.35   Trp (W) 1.30
   Asp (D) 5.27   His (H) 2.25   Phe (F) 3.99   Tyr (Y) 3.22
   Cys (C) 1.78   Ile (I) 5.55   Pro (P) 5.03   Val (V) 6.52

   Asx (B) 0.005  Glx (Z) 0.005  Xaa (X) 0.02


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 4143

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1879
                            2x:  685
                            3x:  387
                            4x:  247
                            5x:  186
                            6x:  143
                            7x:   88
                            8x:   78
                            9x:   76
                           10x:   48
                       11- 20x:  151
                       21- 50x:  103
                       51-100x:   33
                         >100x:   39













<PAGE>





        A.2.2  Table of the most represented species


    Number   Frequency          Species
         1        2530          Human
         2        2376          Escherichia coli
         5        1563          Baker's yeast (Saccharomyces cerevisiae)
         4        1496          Mouse
         5        1385          Rat
         6         654          Bovine
         7         579          Fruit fly (Drosophila melanogaster)
         8         495          Bacillus subtilis
         9         489          Chicken
        10         369          African clawed frog (Xenopus laevis)
        11         341          Salmonella typhimurium
                   341          Rabbit
        13         311          Pig
        14         251          Vaccinia virus (strain Copenhagen)
        15         224          Maize
        16         200          Bacteriophage T4
        17         193          Human cytomegalovirus (strain AD169)
        18         190          Arabidopsis thaliana (Mouse-ear cress)
        19         183          Vaccinia virus (strain WR)
        20         180          Rice
        21         166          Pseudomonas aeruginosa
                   166          Tobacco
        23         164          Pea
        24         162          Wheat
        25         155          Caenorhabditis elegans
                   155          Fission yeast (Schizosaccharomyces pombe)
        27         138          Barley
        28         133          Soybean
        29         131          Slime mold (Dictyostelium discoideum)
        30         130          Spinach
        31         129          Staphylococcus aureus
        32         127          Sheep
        33         119          Marchantia polymorpha (Liverwort)
        34         118          Rhodobacter capsulatus
        35         117          Dog
        36         114          Pseudomonas putida
        37         111          Neurospora crassa
        38         110          Klebsiella pneumoniae
        39         104          Bacillus stearothermophilus












<PAGE>





   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    1966             1001-1100      321
                 51- 100    3345             1101-1200      199
                101- 150    4723             1201-1300      156
                151- 200    3191             1301-1400       97
                201- 250    2804             1401-1500       86
                251- 300    2469             1501-1600       44
                301- 350    2289             1601-1700       46
                351- 400    2377             1701-1800       37
                401- 450    1780             1801-1900       43
                451- 500    1933             1901-2000       31
                501- 550    1325             2001-2100       13
                551- 600     922             2101-2200       40
                601- 650     645             2201-2300       48
                651- 700     488             2301-2400       16
                701- 750     475             2401-2500       18
                751- 800     368             >2500          100
                801- 850     270
                851- 900     293
                901- 950     182
                951-1000     189




   Currently the ten largest sequences are:


                            RYNR_RABIT  5037 a.a.
                            RYNR_HUMAN  5032 a.a.
                            APB_HUMAN   4563 a.a.
                            APOA_HUMAN  4548 a.a.
                            RRPA_CVMJH  4488 a.a.
                            DYHC_TRIGR  4466 a.a.
                            GRSB_BACBR  4451 a.a.
                            PLEC_RAT    4140 a.a.
                            POLG_BVDV   3988 a.a.
                            VGF1_IBVB   3951 a.a.













<PAGE>



                         APPENDIX B: ON-LINE EXPERTS



   B.1  List of on-line experts for PROSITE and SWISS-PROT


Field of expertise            Name               Email address
---------------------------   ------------------ ----------------------------
Alcohol dehydrogenases        Joernvall H.       hans.jornvall at k1m.ki.se
                              Persson B.         bengt.persson at embl-
                                                 heidelberg.de
Aldehyde dehydrogenases       Joernvall H.       hans.jornvall at k1m.ki.se
                              Persson B.         bengt.persson at embl-
                                                 heidelberg.de
Alpha-crystallins/HSP-20      Leunissen J.A.M.   jackl at caos.caos.kun.nl
                              de Jong W.         u629000 at hnykun11.bitnet
Alpha-2-macroglobulins        Van Leuven F.      fred at blekul13.bitnet
AA-tRNA synthetases class II  Leberman R.        leberman at frembl51.bitnet
Apolipoproteins               Boguski M.S.       boguski at ncbi.nlm.nih.gov
AraC family HTH proteins      Ramos J.L.         jlramos at cnbvx3.cnb.uam.es
Arrestins                     Kolakowski L.F.Jr. kolakowski at helix.mgh.
                                                 harvard.edu
Asparaginase / glutaminase    Gribskov M.        gribskov at sdsc.edu
ATP synthase c subunit        Recipon H.         recipon at ncbi.nlm.nih.gov
Band 4.1 family proteins      Rees J.            jrees at vax.oxford.ac.uk
Beta-lactamases               Brannigan J.       jab5 at vaxa.york.ac.uk
Beta-transducin family        Boguski M.S.       boguski at ncbi.nlm.nih.gov
C-type lectin domain          Drickamer K.       drick at cuhhca.hhmi.columbia.
                                                 edu
Chalcone/stilbene synthases   Schroeder J.       raf at sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60       Georgopoulos C.    georgopo at cmu.unige.ch
Chaperonins TCP1 family       Willison K.R.      willison at icr.ac.uk
Chitinases                    Henrissat B.       bernie at cermav.grenet.fr
Clusterin                     Peitsch M.C.       peitsch at ulbio1.unil.ch
Cold shock domain             Landsman D.        landsman at ncbi.nlm.nih.gov
CTF/NF-I                      Mermod N.          nmermod at ulys.unil.ch
                              Gronostajski R.    gronosr at ccsmtp.ccf.org
Cytochromes P450              Holsztynska E.J.   ela at netcom.uucp
                                                 netcom!ela at apple.com
DEAD-box helicases            Linder P.          linder at urz.unibas.ch
Deoxyribonuclease I           Peitsch M.C.       peitsch at ulbio1.unil.ch
dnaJ family                   Kelley W.          kelley at cmu.unige.ch
EF-hand calcium-binding       Cox J.A.           cox at sc2a.unige.ch
                              Kretsinger R.H.    rhk5i at virginia.bitnet
Elongation factor 1           Amons R.           wmbamons at rulgl.leidenuniv.nl
Enoyl-CoA hydratase           Hofmann K.O.       khofmann at biomed.biolan.uni-
                                                 koeln.de
Fatty acid desaturases        Piffanelli P.      piffanelli at jii.afrc.ac.uk
fruR/lacI family HTH proteins Reizer J.          jreizer at ucsd.edu
GATA-type zinc-fingers        Boguski M.S.       boguski at ncbi.nlm.nih.gov
GDT/GTP dissociation stimul.  Boguski M.S.       boguski at ncbi.nlm.nih.gov





<PAGE>




GltP family of transporters   Hofmann K.O.       khofmann at biomed.biolan.uni-
                                                 koeln.de
Glucanases                    Henrissat B.       bernie at cermav.grenet.fr
                              Beguin P.          phycel at pasteur.bitnet
Glutamine synthetase          Tateno Y.          ytateno at genes.nig.ac.jp
G-protein coupled receptors   Chollet A.         arc3029 at ggr.co.uk
                              Attwood T.K.       bph6tka at biovax.leeds.ac.uk
                              Kolakowski L.F.Jr. kolakowski at helix.mgh.
                                                 harvard.edu
GTPase-activating proteins    Boguski M.S.       boguski at ncbi.nlm.nih.gov
HIT family                    Seraphin B.        seraphin at embl-heidelberg.de
HMG1/2 and HMG-14/17          Landsman D.        landsman at ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr. kolakowski at helix.mgh.
                                                 harvard.edu
Integrases                    Roy P.H.           2020000 at saphir.ulaval.ca
Kringle domain                Ikeo K.            kikeo at genes.nig.ac.jp
Lipocalins                    Boguski M.S.       boguski at ncbi.nlm.nih.gov
                              Peitsch M.C.       peitsch at ulbio1.unil.ch
lysR family HTH proteins      Henikoff S.        henikoff at sparky.fhcrc.org
MAC components / perforin     Peitsch M.C.       peitsch at ulbio1.unil.ch
Malic enzymes                 Glynias M.         mglynias at ncsa.uiuc.edu
MAM domain                    Bork P.            bork at embl-heidelberg.de
MIP family proteins           Reizer J.          jreizer at ucsd.edu
Myelin proteolipid protein    Hofmann K.O.       khofmann at biomed.biolan.uni-
                                                 koeln.de
Pancreatic trypsin inhibitor  Ikeo K.            kikeo at genes.nig.ac.jp
PEP requiring enzymes         Reizer J.          jreizer at ucsd.edu
pfkB carbohydrate kinases     Reizer J.          jreizer at ucsd.edu
Phytochromes                  Partis M.D.        partis at gcri.afrc.ac.uk
Plant viruses icosahedral     Koonin E.V.        koonin at ncbi.nlm.nih.gov
capsid proteins
Protein kinases               Hanks S.           hanks at vuctrvax.bitnet
                              Hunter T.          hunter at salk-sc2.sdsc.edu
PTS proteins                  Reizer J.          jreizer at ucsd.edu
Restriction-modification      Bickle T.          bickle at urz.unibas.ch
            enzymes           Roberts R.J.       roberts at neb.com
Ribosomal protein S15         Ellis S.R.         srelli01 at ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.        sharayam at ddbj.nig.ac.jp
Signal sequence peptidases    von Heijne G.      gvh at csb.ki.se
                              Dalbey R.E.        rdalbey at magnus.acs.ohio-
                                                 state.edu
Sodium symporters             Reizer J.          jreizer at ucsd.edu
Subtilases                    Brannigan J.       jab5 at vaxa.york.ac.uk
                              Siezen R.J.        nizo at caos.caos.kun.nl
Thiol proteases               Turk B.            turk at ijs.ac.mail.yu
Thiol proteases inhibitors    Turk B.            turk at ijs.ac.mail.yu
TNF family                    Jongeneel C.V.     vjongene at isrecmail.unil.ch
TPR repeats                   Boguski M.S.       boguski at ncbi.nlm.nih.gov
Transit peptides              von Heijne G.      gvh at csb.ki.se
Type-II membrane antigens     Levy S.            levy at cellbio.stanford.edu
Uracil-DNA glycosylase        Aasland R.         aasland at embl-heidelberg.de





<PAGE>




Vitamin K-depend. Gla domain  Price P.A.         pprice at ucsd.edu
XPGC protein                  Clarkson S.G.      clarkson at cmu.unige.ch
Xylose isomerase              Jenkins J.         jenkins at frira.afrc.ac.uk
WAP-type domain               Claverie J.-M.     jmc at ncbi.nlm.nih.gov
ZP domain                     Bork P.            bork at embl-heidelberg.de


African swine fever virus     Yanez R.J.         ryanez at cbm2.uam.es
Bacteriophage P4              Halling C.         chh9 at midway.uchicago.edu
Caenorhabditis elegans	      Sonnhammer E.      esr at mrc-molecular.biology.
                                                 cam.ac.uk
Chloroplast encoded proteins  Hallick R.B.       hallick at arizona.edu
Drosophila                    Ashburner M.       ma11 at phx.cam.ac.uk
Escherichia coli              Rudd K.            rudd at ncbi.nlm.nih.gov
Salmonella typhimurium        Rudd K.            rudd at ncbi.nlm.nih.gov
Snakes                        Stocklin R.        stocklin at cmu.unige.ch




   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and who would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at the following electronic mail address:

                             bairoch at cmu.unige.ch




















<PAGE>




           APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:


                                                       **********************
                        ***********************        * EPD [Euk. Promot.] *
                        *  EMBL Nucleotide    * <----> **********************
                        *  Sequence Data      *
******************      *  Library            *        **********************
* FLYBASE        * <--> *********************** <----- * ECD [E. coli map]  *
* [Drosophila    *                ^         ^          **********************
* genomic d.b.]  * <------+       |         |
******************        |       |         +--------- **********************
                          |       |                    * TFD [Trans. fact.] *
                          |       |         +--------> **********************
                          |       |         |  
******************        v       v         v          **********************
* REBASE         *      ***********************        * ENZYME [Nomencl.]  *
* [Restriction   * <--- *  SWISS-PROT         * <----- **********************
*  enzymes]      *      *  Protein Sequence   *            |
******************      *  Data Bank          *            v
                        ***********************        **********************
******************       ^  ^  |  |  ^   ^  |          * OMIM   [Diseases]  *
* EcoGene/EcoSeq *       |  |  |  |  |   |  +--------> **********************
* [E. coli]      * <-----+  |  |  |  |   |
******************          |  |  |  |   +-----------> **********************
                            |  |  |  |                 * ECO2DBASE     [2D] *
                            |  |  |  |                 **********************
******************          |  |  |  |
* PROSITE        * <--------+  |  |  +---------------> **********************
* [Patterns]     *             |  |                    * SWISS-2DPAGE  [2D] *
******************             |  +---------------+    **********************
             |                 v                  |                          
             |          ***********************   |    **********************
             +--------> * PDB [3D structures] *   +--> * Aarhus/Ghent  [2D] *
                        ***********************        **********************


















<PAGE>



More information about the Proteins mailing list