FGENESH-2 Improving gene finding by Using similar genomic regions of 2 organisms

webmaster webmaster at softberry.com
Fri Nov 10 03:55:43 EST 2000


FGENESH-2 Improving gene finding accuracy by Using similar genomic regions of 
2 organisms (as Human and Mouse) that contain homologues genes.  

The program of FGENESH type for predicting multiple genes in genomic DNA sequences 
using HMM gene model is available for public usage at: 
http://www.softberry.com/gfs.html 

Ab initio gene prediction programs usually correctly predict significant fraction of exons in a gene, but they often 
assemble gene in incorrect way: combine several genes or split one gene into several, skip exons or include false exons. 
Using 2 organisms information can significantly improve accuracy of EXACT gene finding taking into accunt that 
  Human genome draft sequence and Mouse genomic sequence provide a lot of homologous sequences. 

Program shows Predicted genes in both sequences as 2 sequential Fgenesh outputs. 
EXAMPLE of output for genes predicted in Human and Mouse genomic sequences: 

EXAMPLE of output for genes predicted in Human and Mouse genomic sequences: 
 
 Organism: h Given similarity: 96
 FGENESH-2 1.C Prediction of potential genes in 1st genomic DNA
 Time:   Fri Nov 10 02:55:51 2000
 Seq name: HSCKIIBE
 Length of sequence:  5917  GC content: 53 Zone: 3
 Number of predicted genes 1 in +chain 1 in -chain 0
 Number of predicted exons 6 in +chain 6 in -chain 0
 Positions of predicted genes and exons:
  G Str Feature    Start     End   Score        ORF           Len

  1 +   1 CDSf    1634 -    1705     18.99    1634 -    1705     72
  1 +   2 CDSi    2672 -    2774     38.26    2672 -    2773    102
  1 +   3 CDSi    3344 -    3459     41.09    3346 -    3459    114
  1 +   4 CDSi    3906 -    3981     25.73    3906 -    3980     75
  1 +   5 CDSi    4128 -    4317     67.44    4130 -    4315    186
  1 +   6 CDSl    4645 -    4735     29.35    4646 -    4735     90
  1 +     PolA    4855                0.92

Predicted protein(s):
>FGENESH-2   1   6 exon (s)   1634  -   4735    215 aa, chain +
MSSSEEVSWISWFCGLRGNEFFCEVDEDYIQDKFNLTGLNEQVPHYRQALDMILDLEPDE
ELEDNPNQSDLIEQAAEMLYGLIHARYILTNRGIAQMLEKYQQGDFGYCPRVYCENQPML
PIGLSDIPGEAMVKLYCPKCMDVYTPKSSRHHHTDGAYFGTGFPHMLFMVHPEYRPKRPA
NQFVPRLYGFKIHPMAYQLQLQAASNFKSPVKTIR
 FGENESH-2 1.C Prediction of potential genes in 2nd genomic DNA
 Time:   Fri Nov 10 02:55:51 2000
 Seq name: MMGMCK2B
 Length of sequence:  7874  GC content: 51 Zone: 2
 Number of predicted genes 1 in +chain 1 in -chain 0
 Number of predicted exons 6 in +chain 6 in -chain 0
 Positions of predicted genes and exons:
  G Str Feature    Start     End   Score        ORF           Len

  1 +   1 CDSf    2169 -    2240     38.64    2169 -    2240     72
  1 +   2 CDSi    2829 -    2931     28.70    2829 -    2930    102
  1 +   3 CDSi    4112 -    4227     36.45    4114 -    4227    114
  1 +   4 CDSi    4615 -    4690     18.76    4615 -    4689     75
  1 +   5 CDSi    4801 -    4990     56.00    4803 -    4988    186
  1 +   6 CDSl    6262 -    6352     18.70    6263 -    6352     90
  1 +     PolA    6470                0.92

Predicted protein(s):
>FGENESH-2   1   6 exon (s)   2169  -   6352    215 aa, chain +
MSSSEEVSWISWFCGLRGNEFFCEVDEDYIQDKFNLTGLNEQVPHYRQALDMILDLEPDE
ELEDNPNQSDLIEQAAEMLYGLIHARYILTNRGIAQMLEKYQQGDFGYCPRVYCENQPML
PIGLSDIPGEAMVKLYCPKCMDVYTPKSSRHHHTDGAYFGTGFPHMLFMVHPEYRPKRPA
NQFVPRLYGFKIHPMAYQLQLQAASNFKSPVKTIR


---







More information about the Bio-soft mailing list