NNSSP and SSP SERVER FORMAT CHANGE
Dan Davison
dbd at THEORY.BCHS.UH.EDU
Mon Nov 7 09:37:46 EST 1994
The recently announced secondary structure prediction programs
NNSSP and SSP have had a minor but important change to their
*required* input. Please note the following:
The Baylor College Of Medicine Computational Biology Group
Houston, TX
announces a new service
SSP
Prediction of a-helix and b-strand segments of globular
proteins
***************************************************************************
*********** NOTE FORMAT HAS CHANGED AGAIN!! *********************
***************************************************************************
Analysis of uncharacterized human sequences is available by sending
a file containing a sequence name on the first line and sequence on
subsequent lines (no more than 80 char/line) to
(a sequence name in the first string)
service at bchs.uh.edu
with the Subject line "SSP".
Example: mail -s SSP service at bchs.uh.edu < test.seq
where test.seq a file with the sequence.
NOTE: This service is temporarily being provided through the
University of Houston Gene-Server. Only two jobs will be run at a
time.
Method description:
**********************
Our segment-oriented method is designed to locate secondary structure
elements and uses linear discriminant analysis to assign segments of a
given amino acid sequence to a particular type of secondary structure,
by taking into account the amino acid composition of internal parts
of segments as well as their terminal and adjacent regions.
Four linear discriminant functions were constructed for recognition
of short and long a-helix and b-strand segments, respectively. These
functions combine 3 characteristics: hydrophobic moment, segment
singlet and pair preferences to an a-helix or b-strand.
Accuracy:
************************
Overall 3-states (a, b, c) prediction gives ~65.1% (68.2% with using
homologous sequences) correctly predicted residues on 126
non-homologous proteins using the jack-knife test
procedure (The accuracy is good if you have no homologous sequences
to apply Sander et al. method (Rost,Sander, Mol.Biol,1993,232,584-599)
that has about 71% accuracy with using these sequences and about 61%
without them).
Analysis of the prediction results shows a high
prediction accuracy of long secondary structure segments (~89% of a-
helices of length greater than 8 and ~71% of b-strands of length
greater than 6 are correctly located with probability of correct
prediction 0.82 and 0.78 respectively).
Using the mean values of discriminant functions over the aligned
sequences of homologous proteins, we achieved a prediction accuracy of
68.2%.
It must be mentioned that our variant of nearest-neighbor algorithm
with using multiply sequence alignments of homologous proteins has
72% accuracy and 67.6% accuracy without homologous proteins
(see "nnssp" program of this server).
Submitting sequences via email:
***********************************
For email submission the sequences must have the following format:
a) if you send one sequence:
1 line - sequence name
2 line - number 1 in format I5
3 and subsequent lines - amino acid sequence
for example :
ADENYLATE KINASE
1
RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
PLVQREDDRPETVVK............
(Restrict the line length to 75 characters).
b) if you send multiple aligned sequences
1 line - sequence name
2 line - number of aligned sequences and length of protein
3 and subsequent lines - aligned sequences in format 60a1
(where 3-d line is empty or with numbers as well as other lines
which separate parts of aligned sequences)
for example:
ACTINOXANTHIN
5 107
10 20 30 40 50 60 (numbers not
APAFSVSPASGASDGQSVSVSVAAAGETYYIAQaAPVGGQDAaNPATATSFTTDASGAAS necessary)
APAFSVSPASGLSDGQSVSVSGAAAGETYYIAQCAPVGGQDACNPATATSFTTDASGAAS
APTATVTPSSGLSDGTVVKVAGAgaGTAYDVGQCAWVdgVLACNPADFSSVTADANGSAS
APGVTVTPATGLSNGQTVTVSATgpGTVYHVGQCAVvpGVIGCDATTSTDVTADAAGKIT
ATPKSSSGGAGASTGSGTSSAAVTSgaASSAQQSGLQGATGAGGGSSSTPGTQPGSGAGG
70 80 90 100
FSFTVRKSYAGQTPSGTPVGSVDbATDAbNLGAGNSGLNLGHVALTF
FSFV-RKSYAGZTPSGTPVGSVDCATDACNLGAGNSGLNLGHVALTF
TSLTVRRSFEGFLFDGTRWGTVDCTTAACQVGLSDAAGNGpgVAISF
AQLKVHSSFQAVvaNGTPWGTVNCKVVSCSAGLGSDSGEGAAQAITF
AIAARPVSAMGGtpPHTVPGSTNTTTTAMAGGVGGPgaNPNAAALM-
(you can use small letters for Cys amino acids, if you want)
Alignment MUST be without deletions in the 1-st (query) sequence!!!
You could send the file containing the sequence to:
service at bchs.uh.edu
The subject line must be:
Subject: SSP
Example: mail -s SSP service at bchs.uh.edu < test.seq
Example of SSP output:
*****************************
ADENYLATE KINASE
10 20 30 40 50
pred A: aaaaaaaaa aaaaaaaaa aaaaaaaaa aaa
AA N 4.1 C N 2.2 C N 4.4 C N
pred B: bbbb
BB N2 C
Predic aaaaaaaaa bbbb aaaaaaaaa aaaaaaaaa aaa
a/acid RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
60 70 80 90 100
pred A: aaaaaa aaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa
AA 2.2 C N 4.2 CN 2.4 C N 5.4 C
pred B: bbbbbbb
BB N 2.6 C
Predic aaaaaa aaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa
a/acid KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
The output of the prediction program presents not only final optimal
variant of the secondary structure assign ment, but also a set of
potential a-helix and b-strand segments that were computed without
consideration of their competition. Because the protein secondary
structure is finally stabilized during the formation of the tertiary
structure, the alternative variants of the a-helix and b-strand
segments may be important for methods of tertiary structure
prediction.
Reference:
1.Solovyev V.V.,Salamov A.A.
Method of calculation of discrete secondary structures
in globular proteins. Molek. Biol. 25:810-824,1991 (in Russ.)
2.Solovyev V.V.,Salamov A.A. 1994,
Secondary structure prediction based on discriminant analysis.
In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim
H.A.), World Scientific, p.352-364.
3. Solovyev V.V., Salamov A.A. Predicting a-helix and b-strand segments
of globular proteins. CABIOS (1994) (accepted).
Problems, comments, and suggestion:
Can be mailed to solovyev at cmb.bcm.tmc.edu.
More information about the Proteins
mailing list