New Prediction of Variants of gene structure FGENES-M

Victor Solovyev solovyev at sanger.ac.uk
Mon Oct 12 13:29:49 EST 1998


New Prediction of Variants of gene structure FGENES-M
======================================================
      Fgenes-M variant for mammalian sequences is available at CGG WEB site:

      http://genomic.sanger.ac.uk/gf/gf.html

   There are 2 reasons to predict several sub-optimal variants of
gene structure (instead of only one):

1) Gene prediction algorithms for long genomic sequences are
   just 70-80% accurate in average, therefore a real structure
   might have the score slightly lower than the produced optimal
   variant (and you will never see it for such case having
   just 1 prediction);
2) Mammalian genes often have alternative splicing and your
   sequenced mRNA might not correspond to the predicted variant
   (in this case actually several gene structures are real).

There are thousands of alternative gene structures is possible
to generate and currently does not exist established way to
generate variants exactly corresponding to the real ones.
Fgenes-M variant was proved to be useful in helping provide a set of
possible gene structures for further experimental testing in
commercial gene hunting, therefore I decided to put it to WWW.

FGENES-M 1.5 - Pattern based Human Multiple variants of Gene structure
prediction</b>

   Algorithm outputs several suboptimal variants of predicted gene structure.
In the current WWW server variant up to 10 structures of gene of multiple genes
is
provided.
   It is similar with FGENES and based on pattern recognition of different
types of exons,
promoters and polyA signals and
by dynamic programming finding the optimal combination of them
constructing a set of gene models along a given sequences

You might compare a validity of a predicted variant using GENE WEIGHT,
if it is close to the 1st optimal variant, than it worth to consider it.


A simple example of Fgenes-M output:

 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 214127.7 Date: 19981003
 Seq name:  ACU08131
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   1 Max var=    5 GENE WEIGHT:   24.1
  G Str Feature  Start       End   Weight  ORF-start ORF-end

  1 +     TSS      355              7.43 TATA    327 wTATA   21.08 LDF   0.56
  1 +   1 CDSf     521 -     641    1.23     521 -     640
  1 +   2 CDSi    1066 -    1362    2.08    1068 -    1361
  1 +   3 CDSi    1860 -    2028    1.69    1862 -    2026
  1 +   4 CDSi    2637 -    2802    2.74    2638 -    2802
  1 +   5 CDSi    3558 -    3797    4.35    3558 -    3797
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17

Predicted proteins:
>FGENES 1.5  ACU08131         1 Multiexon gene     521 -    4247     369 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGY
AFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDGSELSSTSRTEVSS
VSNSSVSPA
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 214127.7 Date: 19981003
 Seq name:  ACU08131
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   2 Max var=    5 GENE WEIGHT:   15.1
  G Str Feature  Start       End   Weight  ORF-start ORF-end

  1 +   1 CDSf     218 -     321    1.01     218 -     319
  1 +   2 CDSi     984 -    1023    1.94     985 -    1023
  1 +   3 CDSi    1860 -    2028    1.49    1860 -    2027
  1 +   4 CDSi    2675 -    2802    1.00    2677 -    2802
  1 +   5 CDSi    3558 -    3797    4.35    3558 -    3797
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17

Predicted proteins:
>FGENES 1.5  ACU08131         1 Multiexon gene     218 -    4247     265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 214127.7 Date: 19981003
 Seq name:  ACU08131
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   3 Max var=    5 GENE WEIGHT:   15.1
  G Str Feature  Start       End   Weight  ORF-start ORF-end

  1 +   1 CDSf     218 -     321    1.01     218 -     319
  1 +   2 CDSi     984 -    1023    1.94     985 -    1023
  1 +   3 CDSi    1860 -    2028    1.49    1860 -    2027
  1 +   4 CDSi    2675 -    2802    1.00    2677 -    2802
  1 +   5 CDSi    3558 -    3797    4.35    3558 -    3797
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17

Predicted proteins:
>FGENES 1.5  ACU08131         1 Multiexon gene     218 -    4247     265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 214127.7 Date: 19981003
 Seq name:  ACU08131
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   4 Max var=    5 GENE WEIGHT:   15.1
  G Str Feature  Start       End   Weight  ORF-start ORF-end

  1 +   1 CDSf     218 -     321    1.01     218 -     319
  1 +   2 CDSi     984 -    1023    1.94     985 -    1023
  1 +   3 CDSi    1860 -    2028    1.49    1860 -    2027
  1 +   4 CDSi    2675 -    2802    1.00    2677 -    2802
  1 +   5 CDSi    3558 -    3797    4.35    3558 -    3797
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17

Predicted proteins:
>FGENES 1.5  ACU08131         1 Multiexon gene     218 -    4247     265 a Ch+
MRQGGGQITAQLRDKTFKGFEDLVLQVRGLIRLGGNLLVDVCVVIAILVSQLSGPWPLYL
GNAGSLSASPLEMSSSMPNWPWLALSSPGCGLLYGQHHPSLAGVDVFSGSDDPGVLSYMI
VLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCW
GPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKV
DDGSELSSTSRTEVSSVSNSSVSPA
 FGENES-M 1.5.0 Prediction of several variants of  multiple genes
 Time: 214127.7 Date: 19981003
 Seq name:  ACU08131
 Length of sequence:    5392 GC content: 0.46 Zone: 2
 Number of predicted genes:   1 In +chain:   1 In -chain:   0
 Number of predicted exons:   6 In +chain:   6 In -chain:   0
 Predicted genes and exons in var:   5 Max var=    5 GENE WEIGHT:   13.9
  G Str Feature  Start       End   Weight  ORF-start ORF-end

  1 +     TSS      355              7.43 TATA    327 wTATA   21.08 LDF   0.56
  1 +   1 CDSf     521 -     641    1.23     521 -     640
  1 +   2 CDSi    1066 -    1362    2.08    1068 -    1361
  1 +   3 CDSi    1860 -    2028    1.69    1862 -    2026
  1 +   4 CDSi    2637 -    2802    2.74    2638 -    2802
  1 +   5 CDSi    3558 -    3668    0.99    3558 -    3668
  1 +   6 CDSl    4131 -    4247    2.09    4131 -    4244
  1 +     PolA    4650              3.17

Predicted proteins:
>FGENES 1.5  ACU08131         1 Multiexon gene     521 -    4247     326 a Ch+
MAGTVTEAWDVAVFAARRRNDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITS
VWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISG
YFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSW
VWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILL
CYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTFRNCIMQLFGKK
VDDGSELSSTSRTEVSSVSNSSVSPA



-- 
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.uk  http://genomic.sanger.ac.uk
Phone: 44-1223-494799  FAX:   44-1223-494919




More information about the Bionews mailing list