SPLM Canonical/ Non-canonical splice sites in Human DNA

webmaster webmaster at softberry.com
Thu Apr 12 03:15:59 EST 2001


SPLM - Canonical and Non-canonical splice site prediction in 
Human DNA sequences
SPLM - Prediction of splice sites in Human DNA sequences program 
developed by Salamov A. and Solovyev V.  
   
installed at http://www.softberry.com/nucleo.html

It locates potential splice site positions based on 5 weight 
matrices for donor sites and a model including dinucleotide 
composition and weight matrix for acceptor splice site.  

Program includes prediction of potential GC -donor sites and non-
standard splice sites as AT-AC 

Program does not EXCLUDE splice sites close to sites predicted with 
higher scores or sites on different chains. User could make 
processing based on the reported scores. It designed to be useful 
to analyze ALTERNATIVE Splice variants and NON-CANONICAL splice 
sites. Program has much higher number of overpredicted sites 
comparing with SPL program. 

Some description see at:  
Solovyev V.V. (2001) Statistical approaches in Eukaryotic gene 
prediction. In Handbook of Statistical genetics (eds. Balding D. 
et al.), John Wiley & Sons, Ltd., p. 83-127. 

 TO RUN LOCALLY : ./splm param sequence 

 where   param - name of file with parameters  

 and  sequence - name of file with sequence   

 Options: 

 -d threshold for donor splice sites (default = 95:  -d 95)

 -a threshold for acceptor splice sites (default = 95: -a 95)

 -dGC threshold for GC donor splice sites (default = 95: -dGC 95)

 -nc 1 allow search for AT-AC sites  (default = 0: -nc 0)

 Threshold values are from 1 to 100. 
 For example, value 30 means that threshold set 
 on the level  which detects 30% of highest scoring sites
 from the database of all known splice sites

 Score 20 means that this site has score better than

 bottom 20% of score-ordered known sites

 
Example to run with default parameters: splm hum_spl.dat t.seq > 
t.res 
 Or: splm hum_spl.dat t.seq -d 90 -a 90 -dGC 90 -nc 1 > t.res 


 

Example of output:
splm  Wed Apr 11 23:16:32 EDT 2001
 Prediction of splice sites on Human sequences
 Length of sequence   2040
Number of Donor    sites:     10 Threshold:   90
    1     130      68   -   GT
    2     463      14   +   GT
    3     642      26   +   GT
    4     710      12   +   GT
    5     845      30   +   GT
    6     962      55   -   GT
    7    1024      48   +   GT
    8    1255      22   +   GT
    9    1363      42   +   GT
   10    2029      70   +   GT
Number of Acceptor sites:     29 Threshold:   90
    1      23      43   -   AG
    2     131      13   -   AG
    3     188      13   -   AG
    4     191      91   -   AG
    5     314      44   -   AG
    6     359      14   -   AG
    7     380      29   -   AG
    8     446      74   -   AG
    9     499      14   -   AG
   10     704      15   -   AG
   11     805      19   -   AG
   12     839      39   -   AG
   13     900      14   -   AG
   14     925       9   -   AC
   15     940      26   -   AG
   16    1065      93   +   AG
   17    1401      36   +   AG
   18    1488      80   +   AG
   19    1542      41   +   AG
   20    1593      62   +   AG
   21    1626      49   +   AG
   22    1637      18   -   AG
   23    1674      32   +   AG
   24    1708      41   +   AG
   25    1786      11   +   AG
   26    1825      15   +   AG
   27    1859      84   +   AG
   28    2003      13   +   AG
   29    2020      23   -   AG


---




More information about the Bio-www mailing list