Nicotiana tabacum Gene finding parameters for FGENESH

Softberry Team softberry at softberry.com
Tue Jan 7 07:23:06 EST 2003


      Nicotiana tabacum Gene finding parameters for FGENESH

              the program with parameters for many major model organisms
                    is available for on line usage at:

              http://www.softberry.com/berry.phtml?topic=gfind

  Method description:

A new parameter set for gene prediction in Tobacco Genome is developed
for FGENESH program. Accuracy of prediction of  Nicotiana tabacum protein
coding genes is about 98% on the nucleotide level. Note that Arabidopsis 
parameters is aboput 20% less accurate for prediction of tobacco genes.

The FGENESH algorithm is based on pattern recognition of different types of
signals and Markov chain models of coding regions. Optimal combination of
these features is then found by dynamic programming and a set of gene
models is constructed along given sequence.

FGENESH is the fastest and most accurate ab initio  gene prediction program
available. It can process the whole chromosome sequences.

  Fgenesh output:

fgenesh  Thu Dec 19 17:05:00 EST 2002
 FGENESH 1.1 Prediction of potential genes in Nicotiana_dicot genomic DNA
 Time    :   Thu Dec 19 17:05:00 2002
 Seq name: >putrescine N-methyltransferase, NsPMT1
 Length of sequence: 2209 
 Number of predicted genes 1 in +chain 1 in -chain 0
 Number of predicted exons 8 in +chain 8 in -chain 0
 Positions of predicted genes and exons:
   G Str   Feature   Start        End    Score           ORF           Len

   1 +      TSS         83               -4.38
   1 +    1 CDSf       201 -       426   33.05       201 -       425    225
   1 +    2 CDSi       608 -       684   19.89       610 -       684     75
   1 +    3 CDSi       767 -       994   37.57       767 -       994    228
   1 +    4 CDSi      1100 -      1172   17.59      1100 -      1171     72
   1 +    5 CDSi      1283 -      1354    6.75      1285 -      1353     69
   1 +    6 CDSi      1444 -      1639   24.78      1446 -      1637    192
   1 +    7 CDSi      1802 -      1934   19.43      1803 -      1934    132
   1 +    8 CDSl      2033 -      2089   12.26      2033 -      2089     57
   1 +      PolA      2173               -0.55

Predicted protein(s):
>FGENESH:   1   8 exon (s)    201  -   2089   353 aa, chain +
MEVISTNTNGSTIFKSGAIPMNGHQNGTSKHQNGHKNGTSEEQNGTISHDNGNELLGNSN
CIKPGWFSEFSALWPGEAFSLKVEKLLFQGKSDYQDVMLFESATYGKVLTLDGAIQHTEN
GGFPYTEMIVHLPLGSIPNPKKVLIIGGGIGFTLFEMLRYPTIEKIDIVEIDDVVVDVSR
KFFPYLAANFNDPRVTLVLGDGAAFVKAAQAEYYDAIIVDSSDPIGPAKDLFERPFFEAV
AKALRPGGVVCTQAESIWLHMHIIKQIIANCRQVFKGSVNYAWTTVPTYPTGVIGYMLCS
TEGPEIDFKNPVNPIDKETAQVKSKLAPLKFYNSDIHKAAFILPSFARSMIES
---




More information about the Bio-www mailing list