NNSSP and SSP SERVER FORMAT CHANGE- SSP Details

Dan Davison dbd at THEORY.BCHS.UH.EDU
Mon Nov 7 09:59:20 EST 1994


The recently announced secondary structure prediction programs
NNSSP and SSP have had a minor but important change to their
*required* input.  Please note the following:
      The Baylor College Of Medicine Computational Biology Group
			     Houston, TX
		       announces a new service

                                 SSP
       Prediction of a-helix and b-strand segments of globular
                               proteins

***************************************************************************
*********** NOTE FORMAT HAS CHANGED AGAIN!!           *********************
***************************************************************************

Analysis of uncharacterized human sequences is available by sending
a file containing a sequence name on the first line and sequence on
subsequent lines (no more than 80 char/line) to
(a sequence name in the first string)

                      service at bchs.uh.edu

 with the Subject line "SSP". 

Example: mail -s SSP service at bchs.uh.edu < test.seq

where test.seq a file with the sequence.

NOTE: This service is temporarily being provided through the
University of Houston Gene-Server.  Only two jobs will be run at a
time.
 
Method description:
**********************
   Our segment-oriented method is designed to locate secondary structure 
   elements and uses linear discriminant analysis to assign segments of a  
   given amino acid sequence to a particular type of secondary structure,
   by taking into account the amino acid composition of internal parts 
   of segments as well as their terminal and adjacent regions.
   Four linear discriminant functions were constructed for recognition 
   of short and long a-helix and b-strand segments, respectively. These 
   functions combine 3 characteristics: hydrophobic moment, segment 
   singlet and pair preferences to an a-helix or b-strand.  

Accuracy:
************************
   Overall 3-states (a, b, c) prediction gives ~65.1% (68.2% with using
   homologous sequences) correctly predicted residues on 126 
   non-homologous proteins using the jack-knife test
   procedure (The accuracy is good if you have no homologous sequences
   to apply Sander et al. method (Rost,Sander, Mol.Biol,1993,232,584-599)
   that has about 71% accuracy with using these sequences and about 61% 
   without them).
	Analysis of the prediction results shows a high 
   prediction accuracy of long secondary structure segments (~89% of a-
   helices of length greater than 8 and ~71% of b-strands of length 
   greater than 6 are correctly located with probability of correct
   prediction 0.82 and 0.78 respectively).
  	Using the mean values of discriminant functions over the aligned 
   sequences of homologous proteins, we achieved a prediction accuracy of
   68.2%. 
	It must be mentioned that our variant of nearest-neighbor algorithm
   with using multiply sequence alignments of homologous proteins has 
   72% accuracy and 67.6% accuracy without homologous proteins 
       (see "nnssp" program of this server). 

Submitting sequences via email:
***********************************
  For email submission the sequences must have the following format: 
a) if you send one sequence:
  1 line - sequence name 
  2 line - number 1 in format I5 
  3 and subsequent lines - amino acid sequence
  for example :
  ADENYLATE KINASE     
      1		
  RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
  KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
  QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
  PLVQREDDRPETVVK............
   (Restrict the line length to 75 characters).

b) if you send multiple aligned sequences

  1 line - sequence name 
  2 line - number of aligned sequences and length of protein
  3 and subsequent lines - aligned sequences in format 60a1  
    (where 3-d line is empty or with numbers as well as other lines
     which separate parts of aligned sequences)
for example:

ACTINOXANTHIN                                                         
    5  107
        10        20        30        40        50        60 (numbers not    
APAFSVSPASGASDGQSVSVSVAAAGETYYIAQaAPVGGQDAaNPATATSFTTDASGAAS  necessary) 
APAFSVSPASGLSDGQSVSVSGAAAGETYYIAQCAPVGGQDACNPATATSFTTDASGAAS
APTATVTPSSGLSDGTVVKVAGAgaGTAYDVGQCAWVdgVLACNPADFSSVTADANGSAS
APGVTVTPATGLSNGQTVTVSATgpGTVYHVGQCAVvpGVIGCDATTSTDVTADAAGKIT
ATPKSSSGGAGASTGSGTSSAAVTSgaASSAQQSGLQGATGAGGGSSSTPGTQPGSGAGG
        70        80        90       100      
FSFTVRKSYAGQTPSGTPVGSVDbATDAbNLGAGNSGLNLGHVALTF
FSFV-RKSYAGZTPSGTPVGSVDCATDACNLGAGNSGLNLGHVALTF
TSLTVRRSFEGFLFDGTRWGTVDCTTAACQVGLSDAAGNGpgVAISF
AQLKVHSSFQAVvaNGTPWGTVNCKVVSCSAGLGSDSGEGAAQAITF
AIAARPVSAMGGtpPHTVPGSTNTTTTAMAGGVGGPgaNPNAAALM-
 
(you can use small letters for Cys amino acids, if you want)

Alignment MUST be without  deletions in the 1-st (query) sequence!!!


   You could send the file containing the sequence to: 
           service at bchs.uh.edu
The subject line must be:
   Subject: SSP
Example: mail -s SSP service at bchs.uh.edu < test.seq


Example of SSP output:		
*****************************
   ADENYLATE KINASE     
                    10        20        30        40        50
   pred A:    aaaaaaaaa          aaaaaaaaa     aaaaaaaaa     aaa
   AA         N  4.1  C          N  2.2  C     N  4.4  C     N  
   pred B:                  bbbb                                
   BB                       N2 C                                
   Predic     aaaaaaaaa     bbbb aaaaaaaaa     aaaaaaaaa     aaa
   a/acid     RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
                    60        70        80        90       100
   pred A:    aaaaaa       aaaaaaaaaaaaaaaaaaaaaaa     aaaaaaaaa
   AA         2.2  C       N    4.2    CN   2.4  C     N  5.4  C
   pred B:                 bbbbbbb                              
   BB                      N 2.6 C                              
   Predic     aaaaaa       aaaaaaaaaaaaaaaaaaaaaaa     aaaaaaaaa
   a/acid     KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY

   The output of the prediction program presents not only final optimal
   variant of the secondary structure assign ment, but also a set of
   potential a-helix and b-strand segments that were computed without 
   consideration of their competition. Because the protein secondary 
   structure is finally stabilized during the formation of the tertiary 
   structure, the alternative variants of the a-helix and b-strand 
   segments may be important for methods of tertiary structure 
   prediction.

Reference:
  1.Solovyev V.V.,Salamov A.A.
  Method of calculation of discrete secondary structures 
  in globular proteins. Molek. Biol. 25:810-824,1991 (in Russ.)
  2.Solovyev V.V.,Salamov A.A. 1994,
  Secondary structure prediction based on  discriminant analysis. 
  In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim 
  H.A.), World Scientific, p.352-364.
  3. Solovyev V.V., Salamov A.A. Predicting a-helix and b-strand segments
  of globular proteins. CABIOS (1994) (accepted). 

Problems, comments, and suggestion:
   Can be mailed to solovyev at cmb.bcm.tmc.edu.
   




More information about the Proteins mailing list