Mailservice for Protein domain and foldclass prediction

Martin Reczko reczko at hermes.informatik.uni-stuttgart.de
Thu Feb 16 10:39:57 EST 1995


     THE DEF DATA BASE AND MAIL SERVICE FOR
  SEQUENCE BASED PROTEIN FOLD CLASS PREDICTIONS

                   Martin Reczko 
Department of Molecular Biophysics, German Cancer Research Center,
                Heidelberg, Germany and
                    Henrik Bohr
       Center for Biological Sequence Analysis, 
        The Technical University of Denmark,
                 Lyngby, Denmark


The DEF (Database for Expected Fold-classes) and mail service generates
protein fold-class and protein domain predictions from sequences in the
SWISSPROT protein sequence data base or individual sequences. In the
DEF output a sequence of amino acids is assigned a specific overall
fold-class, a super fold-class with respect to secondary structure
content and spatial distribution and a profile of possible fold-classes
along the sequence. The definition of protein domains is derived from
this foldclass profile. The assignment of a fold-class is one out of 45
well-known folds derived from the 3-dimensional protein structures in
the Brookhaven Protein Data Bank, PDB.  Most of these 45 fold-classes
are contained in the set "3d-ali" given by Pascarella and Argos,
Prot. Eng. 5:121-137 (1992). In this context folds are protein domains
with a distinct back-bone topology of their 3-dimensional structure.

Performance

The prediction of the 44 classes is correct in 77 % of 130 test cases
(a random prediction is 2.3 % correct). Sequences with 0 to 25 %
sequence identity to proteins of the training set are predicted
correctly in more than 70 % of the cases.

The 4 super classes are all-alpha, alpha*beta, alpha+beta, and all-
beta. The alpha*beta superclass stands here for alpha-helices and
beta-sheets intertwined while the alpha+beta class has alpha-helices
and beta-sheets separated in distinct domains.  The prediction of the 4
superclasses is correct in 90.4 % of the test cases.

The predictions are generated by artificial neural networks as descibed in

  Reczko, M. and Bohr, H., The DEF Data Base of Sequence Based Protein
  Fold Class Predictions,Nucl. Ac. Res. 22,p. 3616-3619 (1994)

  Reczko, M., Bohr. H., Sudhakar, P. V., Hatzigeorgiou, A.,
  Subramaniam, S., Fold Class Prediction by Neural Networks, In:
  Protein Structures by Distance Analysis, p. 277-286, Eds. Bohr, H. and
  Brunak, S., IOS press, (1994)

Availiability:

The DEF mailserver for individual predictions:
   An automatic mail server that can make fold-class predictions for any
   sequence submitted.  Just send a mail to
                                            def at mbp-sgi4.inet.dkfz-heidelberg.de
   containing your sequence in single letter code in the Subject
   line or in the mail text with an empty Subject line.
   Sequence lines longer than 120 residues must be seperated
   by carrige returns, shorter lines are ok.

Anonymous ftp address:
  mbp-sgi4.inet.dkfz-heidelberg.de  or 193.174.48.50
  cd /pub/databases/def

  Currently HUMAN, ECOLI, YEAST, MOUSE, DROME (drosophila melanogaster),
   CAEEL (Caenorhabditis elegans), BOVINE and RAT
  proteins are avaliable.

  *** Other proteins may be predicted using the DEF mailserver ***

Contact:
Martin Reczko, Molekular Biophysics (0810)
German Cancer Research Center, 69120 Heidelberg, Germany.
Telephone: +49-6221-422338, Telefax: +49-6221-422885
email: reczko at dkfz-heidelberg.de
--
__________________________________________________
Dept. of Molecular Biophysics (0810)
German Cancer Research Center
Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
Tel: (49) 6221-422338, FAX: (49) 6221-422333
email: reczko at dkfz-heidelberg.de




More information about the Bio-soft mailing list