Nicotiana tabacum Gene finding parameters for FGENESH

Softberry Team softberry at softberry.com
Thu Dec 19 20:01:02 EST 2002


       Nicotiana tabacum Gene finding parameters for FGENESH

               the program with parameters for many major model organisms
                     is available for on line usage at:

               http://www.softberry.com/berry.phtml?topic=gfind

   Method description:

A new parameter set for gene prediction in Tobacco Genome is developed
for FGENESH program. Accuracy of prediction of  Nicotiana tabacum protein
coding genes is about 98% on the nucleotide level. Note that Arabidopsis
parameters is about 20% less accurate for prediction of tobacco genes.

The FGENESH algorithm is based on pattern recognition of different types of
signals and Markov chain models of coding regions. Optimal combination of
these features is then found by dynamic programming and a set of gene
models is constructed along given sequence.

FGENESH is the fastest and most accurate ab initio  gene prediction program
available. It can process the whole chromosome sequences.

   Fgenesh output:

fgenesh  Thu Dec 19 17:05:00 EST 2002
  FGENESH 1.1 Prediction of potential genes in Nicotiana_dicot genomic DNA
  Time    :   Thu Dec 19 17:05:00 2002
  Seq name: >putrescine N-methyltransferase, NsPMT1
  Length of sequence: 2209
  Number of predicted genes 1 in +chain 1 in -chain 0
  Number of predicted exons 8 in +chain 8 in -chain 0
  Positions of predicted genes and exons:
    G Str   Feature   Start        End    Score           ORF           Len

    1 +      TSS         83               -4.38
    1 +    1 CDSf       201 -       426   33.05       201 -       425    225
    1 +    2 CDSi       608 -       684   19.89       610 -       684     75
    1 +    3 CDSi       767 -       994   37.57       767 -       994    228
    1 +    4 CDSi      1100 -      1172   17.59      1100 -      1171     72
    1 +    5 CDSi      1283 -      1354    6.75      1285 -      1353     69
    1 +    6 CDSi      1444 -      1639   24.78      1446 -      1637    192
    1 +    7 CDSi      1802 -      1934   19.43      1803 -      1934    132
    1 +    8 CDSl      2033 -      2089   12.26      2033 -      2089     57
    1 +      PolA      2173               -0.55

Predicted protein(s):
>FGENESH:   1   8 exon (s)    201  -   2089   353 aa, chain +
MEVISTNTNGSTIFKSGAIPMNGHQNGTSKHQNGHKNGTSEEQNGTISHDNGNELLGNSN
CIKPGWFSEFSALWPGEAFSLKVEKLLFQGKSDYQDVMLFESATYGKVLTLDGAIQHTEN
GGFPYTEMIVHLPLGSIPNPKKVLIIGGGIGFTLFEMLRYPTIEKIDIVEIDDVVVDVSR
KFFPYLAANFNDPRVTLVLGDGAAFVKAAQAEYYDAIIVDSSDPIGPAKDLFERPFFEAV
AKALRPGGVVCTQAESIWLHMHIIKQIIANCRQVFKGSVNYAWTTVPTYPTGVIGYMLCS
TEGPEIDFKNPVNPIDKETAQVKSKLAPLKFYNSDIHKAAFILPSFARSMIES
---



More information about the Arab-gen mailing list