Modifications HSPL Genefinder service

Dan Davison dbd at THEORY.BCHS.UH.EDU
Tue Oct 25 10:21:54 EST 1994


      The Baylor College Of Medicine Computational Biology Group
			     Houston, TX
		       announces a new service

                                 HSPL
      Email server for splice site prediction in Human sequences


***************************************************************************
*********** NOTE ADDRESSES AND FORMATS HAVE CHANGED!! *********************
***************************************************************************

Analysis of uncharacterized human sequences is available sending the
file containing a sequence name on the first line and a sequence (no
more than 80 chars/line) to

                      service at bchs.uh.edu

with the subject line "HSPL". 

Example: mail -s HSPL service at bchs.uh.edu < test.seq

where test.seq a file with the sequence.


NOTE: This service is temporarily being provided through the
University of Houston Gene-Server.  Only two jobs will be run at a
time.
 
Method description:
*******************

Using information about significant triplet frequencies in various
functional parts of splicing site regions, and preferences of
octanucleotides in protein coding and intron regions, a combined
linear discriminant recognition function was developed. The splice
site prediction scheme gives an accuracy  of donor site recognition on
the test set 97% (correlation coefficient C=0.62) and 96% for acceptor
splice sites (C=0.48). The method is a good alternative to neural
network approach (Brunak et al.,Mol.Biol.,1991) that has C=0.61 with
95% accuracy of donor site prediction and C < 40 with 95% accuracy of
acceptor site prediction.  

More precise splice site positions might be found if you will use 
programs of exons recognition (HEXON, FEXH) and gene structure
prediction (FGENEH) from the server.
 
=========================  HSSP citation   ===============================
You should cite in your references one of the following
papers: 

Solovyev V.V., Lawrence C.B. (1994) Prediction of Primate mRNA donor 
and acceptor splice sites based on oligonucleotide composition. Mol.Biol.
(submitted).
or
Solovyev V.V., Salamov A.A., Lawrence C.B. 1994. The prediction of Human exons 
by oligonucleotide composition and discriminant analysis of spliceable open 
reading frames. In Proceedings of the Second International Conference on 
Intelligent Systems for Molecular Biology (eds. Altman R., Brutlag D.,
Karp R., Latrop R. and Searls D.), AAAI Press, Menlo Park, CA (in press).

Solovyev V.V., Salamov A.A., Lawrence C.B. 1994. 
Predicting internal exons by oligonucleotide composition and discriminant 
analysis of spliaceable open reading frames. Nucleic Asids Res. (in press).



Current version of the program predicts only splice sites with GT and
AG conserved base pair for donor and acceptor splice sites,
respectively. They are usualy include more than 99% of all authentic
splice sites.

Further versions of the program will have options for the other
variants of conservative dinucleotides and extention for the other
species.

Input data:
***************
	Following an example of data representation for the program:

1st string are 2 thresholds (donor and acceptor). You can use them or
decrease a little bit if you want to have more potential variants. 

2nd string is the name of your sequence starting from space symbol.

3d  string and the next are the sequence ( strings must be not more
than 80 letters). 
-----------------------------------------------------------
 76 65
   HUMALPHA      ds-DNA             
cccgggctgtgtgcttccagcctcccctcctctcgacaccagaacagagcctggccccca
gctcccaggaaatacagaaaaaaaaaatggtggatgaacgagtgacagggtgtcttgttc
cacacaagacacagtgagcaggggttgggggaggggcccctggggcaggatgcacactgc
actatacccaaaatccccacccttccctggggacacctggtccaccctaagctgcctttc
---------------------------------------------------------------

The output of the program (enclosed below) includs: name, length and
positions and scores of the predicted splice sites. It must be
mentioned that there are some pseudosplice sites among them and the
higher the score of a site the more probably it is an authentic splice
site.

Questions, comments, and suggestions about the program, please, send
Email to solovyev at cmb.bcm.tmc.edu.

Program output:

   HUMALPHA     4556 bp ds-DNA             PRI       15-SEP-1                   
 Length of sequence -   4556
Number of Donor    sites:   11 Threshold: 0.76
    1    329  0.76
    2    517  0.87
    3    728  0.88
    4    955  0.98
    5   1322  0.81
    6   1954  0.85
    7   1967  0.82
    8   2126  0.84
    9   2389  0.84
   10   2662  0.79
   11   2998  0.92
Number of Acceptor sites:   18 Threshold: 0.65
    1    244  0.65
    2    379  0.67
    3    610  0.89
    4    615  0.68
    5    838  0.83
    6   1146  0.75
    7   1398  0.71
    8   1818  0.78            
    9   1828  0.66
   10   2052  0.88
   11   2253  0.84
   12   2469  0.81
   13   2880  0.81
   14   3119  0.80
   15   3480  0.70
   16   3989  0.69
   17   4059  0.70
   18   4273  0.71






More information about the Bio-soft mailing list