New Genefinder service: NNSSP

Dan Davison dbd at THEORY.BCHS.UH.EDU
Tue Oct 25 10:22:48 EST 1994


      The Baylor College Of Medicine Computational Biology Group
			     Houston, TX
		       announces a new service

                                NNSSP

        Prediction of protein secondary sturcture by combining
     nearest-neighbor algorithms and multiple sequence alignments
                         (version 1. 10.5.94)

***************************************************************************
*********** NOTE ADDRESSES AND FORMATS HAVE CHANGED!! *********************
***************************************************************************

Analysis of protein primary sequences is available through the
University of Houston Gene-Server by sending the file containing a
sequence (a sequence name in the first string) to
        service at bchs.uh.edu
with the subject line "nnssp". 

Example: mail -s nnssp service at bchs.uh.edu < test.seq

where test.seq a file with the sequence.
 
Method description: ********************** 
Yi and Lander (*) developed a neural-network and nearest-neighbor
method with a scoring system that combined a sequence similarity
matrix with the local structural environment scoring scheme of Bowie
et al.(**) for predicting protein secondary structure.  We have
improved their scoring system by taking into consideration N- and
C-terminal positions of a-helices and b-strands and also b-turns as
distinctive types of secondary structure. Another improvement, which
also significantly decrease the time of computation, is performed by
restricting a data base with a smaller subset of proteins which are
similar with a query sequence. Using multiple sequence alignments
rather than single sequences and a simple jury decision method we
achieved an over all three-state accuracy of 72.2%, which is better
than that observed for the most accurate multilayered neural network
approach, tested on the same data set of 126 non-homologous protein
chains.

(*) Yi T-M., Lander E.S. (1993)
  Protein secondary structure prediction using nearest-neighbor methods. 
  J.Mol.Biol.,232:1117-1129.
  
(**) Bowie J.U., Luthy R., Eisenberg D. (1991)
  A method to identify protein sequences that fold into a known 
  three-dimensional structure.
  Science, 253, 164-170.)



Accuracy:
************************
   Overall 3-states (a, b, c) prediction gives ~67.6% correctly predic-
   ted residues on 126 non-homologous proteins using the jack-knife test
   procedure. 

   Using multiple sequence alignments instead of single sequences increases 
   prediction accuracy up to 72.2%. 

Submitting sequences via email:
***********************************
  For email submission the sequences must have the following format: 
  a) if you send one sequence:
  1 line - sequence name 
  2 line - number 1 in format I5 
  3 and subsequent lines - amino acid sequence
  for example :
  ADENYLATE KINASE     
      1		
  RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
  KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
  QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
  PLVQREDDRPETVVK............
   (Restrict the line length to 75 characters).
   b) if you send multiple aligned sequences
  1 line - sequence name 
  2 line - number of aligned sequences and length of protein
  3 and subsequent lines - aligned sequences in format 60a1  
  for example:

ACTINOXANTHIN                                                         
    5  107
        10        20        30        40        50        60 (numbers not necessary)     
APAFSVSPASGASDGQSVSVSVAAAGETYYIAQaAPVGGQDAaNPATATSFTTDASGAAS
APAFSVSPASGLSDGQSVSVSGAAAGETYYIAQCAPVGGQDACNPATATSFTTDASGAAS
APTATVTPSSGLSDGTVVKVAGAgaGTAYDVGQCAWVdgVLACNPADFSSVTADANGSAS
APGVTVTPATGLSNGQTVTVSATgpGTVYHVGQCAVvpGVIGCDATTSTDVTADAAGKIT
ATPKSSSGGAGASTGSGTSSAAVTSgaASSAQQSGLQGATGAGGGSSSTPGTQPGSGAGG
        70        80        90       100      
FSFTVRKSYAGQTPSGTPVGSVDbATDAbNLGAGNSGLNLGHVALTF
FSFV-RKSYAGZTPSGTPVGSVDCATDACNLGAGNSGLNLGHVALTF
TSLTVRRSFEGFLFDGTRWGTVDCTTAACQVGLSDAAGNGpgVAISF
AQLKVHSSFQAVvaNGTPWGTVNCKVVSCSAGLGSDSGEGAAQAITF
AIAARPVSAMGGtpPHTVPGSTNTTTTAMAGGVGGPgaNPNAAALM-
 
(you can use small letters for Cys aminoacids, if you want)

Alignment MUST be without  deletions in the 1-st (query) sequence!!!

   You could send the file containing the sequence to: 
   service at bchs.uh.edu
   Subject line must be:
   nnssp
Example: mail -s nnssp service at bchs.uh.edu < test.seq


Example of NNSSP output:		
*****************************
   ADENYLATE KINASE     
                    10        20        30        40        50
 Predic     aaaaaaa                  bbb      aaaaaaaa      aa
 a/acid     RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
                    60        70        80        90       100
 Predic     aaaaaa      aaaaaaaaaaaaaa              aaaaaaaaaa
 a/acid     KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
                   110       120       130       140       150
 Predic        bbbb    aaaaaaaa   bb      bbbbbb              
 a/acid     QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
                   160       170       180       190       200
 Predic              aaaaaaaaaaa   aaaaaaaaaa   bb         aaa
 a/acid     PLVQREDDRPETVVKRLKAYEAQTEPVLEYYRKKGVLETFSGTETNKIWP
                   210
 Predic     aaaaaaaa      
 a/acid     HVYAFLQTKLPQRS

Reference:

Salamov A.A., Solovyev V.V. (1994)
  Prediction of protein secondary sturcture by combining nearest-neighbor
  algorithms and multiply sequence alignments. 
  Submitted to J.Mol.Biol. 


Problems, comments, and suggestion:
   Can be mailed to solovyev at cmb.bcm.tmc.edu.
   




More information about the Bio-soft mailing list