Advanced version of FGENES multiple gene prediction

Victor Solovyev solovyev at sanger.ac.uk
Tue Nov 24 13:55:29 EST 1998


 We installed the FGENES 1.6 of multiple gene prediction
It is available at http://genomic.sanger.ac.uk/
Computational Genomic Group WEB server
(http://genomic.sanger.ac.uk/gf/gf.html)
     It works significantly better than the older one.
Enclosed the results of accuracy for the standard Guigo test
and test for long or multiple human genes and example of
        100% true prediction for 32906 bases
Human IFNAR gene for interferon alpha/beta receptor.

Data on exon level describe exact exon prediction,
on nucleotide level account for partial predicted exons also.

Guigo dataset of 570 genes:
===================================
                                        Fgenes 1.6:
     ALL EXONS: OBSERVED - 2663 EXACTLY PREDICTED: 2233 84%
     (averaged over all genes)
     Sne- 82.7 Spe- 82.0 Sn_n- 91.9 Sp_n 93.1 C- 0.92
     no prediction cases - 1
     Init: Observed - 570 Predicted - 576 Correct - 470 82%
     Intr: Observed - 1523 Predicted - 1548 Correct - 1311 86%
     Term: Observed - 570 Predicted - 567 Correct - 452 79%
     Sngl: Observed - 0 Predicted 7 Correct - 0
                                        Genescan:
     ALL EXONS: OBSERVED - 2663 EXACTLY PREDICTED: 2166 81%
     Sne- 77.7 Spe- 80.8 Sn_n- 93.1 Sp_n 92.8 C- 0.92
     no prediction cases - 8
     Init: Observed - 570 Predicted - 449 Correct - 369 65%
     Intr: Observed - 1523 Predicted - 1688 Correct - 1366 90%
     Term: Observed - 570 Predicted - 487 Correct - 431 76%
     Sngl: Observed - 0 Predicted 3 Correct - 0
Sne - sensitivity on the exon level; Spe - specificity on the exon level
Sn_n - sensitivity on the nucleotide level; Sp_n - specificity on the
nucleotide level

The dataset of 38 human genomic sequences:
(19 genes 20000 -240000 bp long + 19 multiple gene sequences)
=============================================================
                                        Fgenes 1.6:
     ALL EXONS: OBSERVED - 705 EXACTLY PREDICTED: 590 84%
	(averaged over all exons)
     Sne- 83.7 Spe- 68.3 Sn_n- 92.0 Sp_n 75.9 C- 0.84
     no prediction cases - 1
     Init: Observed - 71 Predicted - 118 Correct - 50 70%
     Intr: Observed - 557 Predicted - 624 Correct - 489 88%
     Term: Observed - 71 Predicted - 116 Correct - 51 72%
     Sngl: Observed - 6 Predicted 6 Correct - 0
                                        Genescan:
     ALL EXONS: OBSERVED - 705 EXACTLY PREDICTED: 553 78%
	(averaged over all exons)
     Sne- 78.4 Spe- 66.1 Sn_n- 92.4 Sp_n 69.8 C- 0.80
     no prediction cases - 1
     Init: Observed - 71 Predicted - 93 Correct - 36 51%
     Intr: Observed - 557 Predicted - 635 Correct - 469 84%
     Term: Observed - 71 Predicted - 98 Correct - 48 68%
     Sngl: Observed - 6 Predicted 11 Correct - 0

 FGENES 1.6 Prediction of multiple genes in genomic DNA
 Time: 18:22:57 Date: Sat Nov 21 1998
 Seq name: >    HSIFNAR     32906 bp    DNA             PRI       25-NO
 Length of sequence:   32906 GC content: 0.41 Zone: 1
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:  11 In +chain:  11 In -chain:   0
 Positions of predicted genes and exons:
  G Str Feature  Start       End   Weight  ORF-start ORF-end

  1 +   1 CDSf     754 -     829   13.59     754 -     828
  1 +   2 CDSi   11201 -   11324    4.41   11203 -   11322
  1 +   3 CDSi   16768 -   16943    4.15   16769 -   16942
  1 +   4 CDSi   19033 -   19187    3.66   19035 -   19187
  1 +   5 CDSi   19300 -   19441    2.77   19300 -   19440
  1 +   6 CDSi   21017 -   21131    3.76   21019 -   21129
  1 +   7 CDSi   24861 -   25060    3.24   24862 -   25059
  1 +   8 CDSi   25159 -   25313    2.63   25161 -   25313
  1 +   9 CDSi   28528 -   28678    5.40   28528 -   28677
  1 +  10 CDSi   29408 -   29553    3.02   29410 -   29553
  1 +  11 CDSl   31085 -   31318    4.23   31085 -   31315

Predicted proteins:
>FGENES 1.5 >    HSIFNAR      1 Multiexon gene     754 -   31318     557 a Ch+
MMVVLLGATTLVLVAVAPWVLSAAAGGKNLKSPQKVEVDIIDDNFILRWNRSDESVGNVT
FSFDYQKTGMDNWIKLSGCQNITSTKCNFSSLKLNVYEEIKLRIRAEKENTSSWYEVDSF
TPFRKAQIGPPEVHLEAEDKAIVIHISPGTKDSVMWALDGLSFTYSLLIWKNSSGVEERI
ENIYSRHKIYKLSPETTYCLKVKAALLTSWKIGVYSPVHCIKTTVENELPPPENIEVSVQ
NQNYVLKWDYTYANMTFQVQWLHAFLKRNPGNHLYKWKQIPDCENVKTTQCVFPQNVFQK
GIYLLRVQASDGNNTSFWSEEIKFDTEIQAFLLPPVFNIRSLSDSFHIYIGAPKQSGNTP
VIQDYPLIYEIIFWENTSNAERKIIEKKTDVTVPNLKPLTVYCVKARAHTMDEKLNKSSV
FSDAVCEKTKPGNTSKIWLIVGICIALFALPFVIYAAKVFLRCINYVFFPSLKPSSSIDE
YFSEQPLKNLLLSTSEEQIEKCFIIENISTIATVEETNQTDEDHKKYSSQTSQDSGNYSN
EDESESKTSEELQQDFV

-- 
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.uk  http://genomic.sanger.ac.uk
Phone: 44-1223-494799  FAX:   44-1223-494919




More information about the Bionews mailing list