FGENESH_GC - NONCANONICAL GC in predicting genes/ ALTERNATIVELY SPLICED genes

webmaster_softberry_com at acedsl.com webmaster_softberry_com at acedsl.com
Thu Jan 31 08:54:15 EST 2002


FGENESH_GC - NONCANONICAL GC in predicting genes/ ALTERNATIVELY SPLICED genes  

A version of FGENESH program including NONCANONICAL GC dinucleotide 
in donor splice sites is installed to use on-line:
 
www.softberry.com 

This program is useful to analyze ALTERNATIVE gene structure, where non-
standard splice
sites are often found (see also FGENES-M program to predict alternative gene 
variants)
and create A SET of GENES and PROTEINS absent in standard gene prediction.

Donor GC splice site is accounting for the major part of non-standard splice 
sites in 
human genes. It present about 0.6% of all splice sites and observed in more 
than 5% of 
human genes. Prediction genes on large scale genomic sequences will contain 
hundreds of 
GC-donor exons and required programs which will predict their major amount. 
The noncanonical splice sites were investigated by us recently 
(Burset, Seledtsov and Solovyev, 2000,Nucleic Acids Res., 28(21), 4364-4375.) 
and we received about 20000 verified by EST splice sites. We received a very 
strong 
GC-donor site weight matrix which is used in gene prediction program. We have 
developed 
this variant of program to predict GC-donor exons in addition to standard exons 
and we 
preserve the accuracy of program on the standard genes. Testing the program on 
68 human 
genes with at least one GC donor site shows that FGENESH (GC) provide 10% 
higher rate 
of exact exon prediction for such group and 5% higher accuracy on the 
nucleotide livel. 

Click Human parameters and FGENESH_GC button Paste your sequence to the window 
or 
load your file with sequence in FASTA format 

Solovyev V.V. (2001) Statistical approaches in Eukaryotic gene prediction. 
In Handbook of Statistical genetics (eds. Balding D. et al.), 
John Wiley & Sons, Ltd., p. 83-127. 

Fgenesh_GC output: 

(IN THIS EXAMPLE 2nd EXON HAVING GC-DONOR SITE IS FOUND, and it is LOST by 
STANDARD gene finders)
 
G - predicted gene number, starting from start of sequence; 
Str - DNA strand (+ for direct or - for complementary); 
Feature - type of coding sequence: CDSf - First (Starting with Start codon), 
CDSi - internal (internal exon), CDSl - last coding segment, ending with stop 
codon); 
TSS - Position of transcription start (TATA-box position and score); 
Start and End - Position of the Feature; 
Weight - Log likelihood*10 score for the feature; 
ORF - start/end positions where the first complete codon starts and the last 
codon ends. 


fgeneshgc  Wed Jan 30 20:59:27 EST 2002
 FGENESH (with GC possible donor site) Gene prediction in Human      genomic DNA
 Time:   Wed Jan 30 20:59:27 2002
 Seq name: Softberry SERVER PAST Sequence 
 Length of sequence:  2932  GC content: 65 Zone: 4
 Number of predicted genes 1 in +chain 1 in -chain 0
 Number of predicted exons 5 in +chain 5 in -chain 0
 Positions of predicted genes and exons:
  G Str Feature    Start     End   Score        ORF           Len

  1 +   1 CDSf     501 -     580     15.57     501 -     578     78
  1 +   2 CDSi     747 -     853     22.53     748 -     852    105
  1 +   3 CDSi    1847 -    1980     17.97    1849 -    1980    132
  1 +   4 CDSi    2255 -    2333     10.88    2255 -    2332     78
  1 +   5 CDSl    2563 -    2705     15.94    2565 -    2705    141

Predicted protein(s):
>FGENESH   1   5 exon (s)    501  -   2705    180 aa, chain +
MADSELQLVEQRIRSFPDFPTPGVVFRDISPVLKDPASFRAAIGLLARHLKATHGGRIDY
IAGLDSRGFLFGPSLAQELGLGCVLIRKRGKLPGPTLWASYSLEYGKAELEIQKDALEPG
QRVVVVDDLLATGGTMNAACELLGRLQAEVLECVSLVELTSLKGREKLAPVPFFSLLQYE



-------------------------------------------------
This mail sent through AceDSL WebMail (http://webmail.acedsl.com)

---




More information about the Bio-www mailing list