Finding Operons and genes in Bacillus

Victor victor at softberry.com
Wed Sep 4 02:37:11 EST 2002


FGENESB - Finding operons and genes in microbial
genomes

In the current FgenesB version simple operon
prediction model is realized
based on gene distances. It can recognize accurately
70% of single
transcription
units and define exactly about 43% of operons (~92%
partially). Increasing
accuracy of operon identification using promoter,
terminator
and other features is under development.
 
 We have developed new fgenesB-annotator script that
 find similar protein
 in public databases and annotates predicted genes.
 This script also can
 discover additional low scoring genes if they have
 knowm homologous protein.
 
 Example of annotation of Bacillus anthracis A2012
 main chromosome see at
 
 
 http://www.softberry.com/bact/ba.ann
 
  FgenesB-annotator:  Finding operons and genes in
 microbial genomes
 (Softberry Inc.)
 Seq name: gi|20520073|gb|AAAC01000001.1| Bacillus
 anthracis A2012 main
 chromosome, whole genome shotgun sequence
  Length of sequence - 5093554 bp  Parameters:
 Bacillus anthracis
  Number of predicted genes - 5917, with homology -
 5480
  Number of transcription units - 3568, operons -
 1224
      N      Tu/Op    S             Start         End
    Score
 
      1     1 Op  1   +    CDS        273 -       953
    692 ## MgtC, MgtC
 family [Bacillus anthracis A2012] [Bacillus anthr
      2     1 Op  2   +    CDS       1049 -      2044
    625 ##
 Similar_to_GB_hypothetical
      3     2 Tu  1   -    CDS       2031 -      2444
    461 ##
 Similar_to_GB_hypothetical
      4     3 Tu  1   -    CDS       2552 -      3904
   1599 ## PGI,
 Phosphoglucose isomerase [Bacillus anthracis A2012]
 [Ba
      5     4 Tu  1   +    CDS       4179 -      4412
    393 ##
 Similar_to_GB_hypothetical
      6     5 Tu  1   -    CDS       4525 -      4869
    470 ## S1, Ribosomal
 protein S1-like RNA-binding domain [Bacillus a
      7     6 Op  1   -    CDS       5122 -      6312
   1010 ##
 aminotran_1_2, Aminotransferase class I and II
 [Bacillus ant
      8     6 Op  2   -    CDS       6309 -      6806
    639 ##
 ASNC_trans_reg, AsnC family [Bacillus anthracis
 A2012] [Baci
      9     7 Tu  1   +    CDS       6954 -      7916
   1144 ## 2-Hacid_DH_C,
 D-isomer specific 2-hydroxyacid dehydrogenase,
     10     8 Tu  1   +    CDS       8026 -      8865
    644 ## abhydrolase,
 alpha/beta hydrolase fold [Bacillus anthracis A
     11     9 Tu  1   -    CDS       8895 -      9146
    292 ##
 Similar_to_GB_hypothetical
     12    10 Tu  1   +    CDS       9264 -     10415
    886 ##
 aminotran_1_2, Aminotransferase class I and II
 [Bacillus ant
     13    11 Tu  1   -    CDS      10600 -     11097
    539 ## sodcu,
 Copper/zinc superoxide dismutase (SODC) [Bacillus
 ant
     14    12 Tu  1   -    CDS      11208 -     11384
    264 ##
 Similar_to_GB_hypothetical
    15    13 Tu  1   +    CDS      11550 -     11933
    526 ##
 Similar_to_GB_hypothetical
>     16    14 Tu  1   -    CDS      11975 -     12598
>    605 ## EXOIII,
> exonuclease domain in DNA-polymerase alpha and epsil
>     17    15 Tu  1   +    CDS      12888 -     14213
>   1615 ## ArsB,
> Arsenical pump membrane protein [Bacillus anthracis
> A2
>     18    16 Tu  1   -    CDS      14272 -     14739
>    418 ##
> Similar_to_GB_hypothetical
>     19    17 Tu  1   +    CDS      14858 -     15571
>    661 ##
> Similar_to_GB_hypothetical
>     20    18 Tu  1   +    CDS      15919 -     17295
>   1497 ##
> HGTP_anticodon, Anticodon binding domain [Bacillus
> anthracis
>     21    19 Tu  1   -    CDS      17333 -     17716
>    496 ## DUF157,
> Uncharacterized protein PaaI, COG2050 [Bacillus anth
>     22    20 Op  1   +    CDS      17812 -     18555
>    500 ##
> Similar_to_GB_hypothetical
>     23    20 Op  2   +    CDS      18606 -     19199
>    756 ## BioY, BioY
 family [Bacillus anthracis A2012] [Bacillus anthr
 
 
 
 New FgenesB is the fastest (E.coli genome analyzed
 in ~14 sec) and most
 accurate ab initio Bacterial gene prediction program
 available.
 
 	http://ww.softberry.com/berry.phtml?topic=fgenesb
It uses parameters learned for different bacteria by>
FgenesB-train script,
which input is just new bacterial sequence. It will
 automatically create
 file with gene prediction parameters for the
 analyzed organism.
 It takes only ~10 minutes to create such file for
 such genome as
 E.coli using its sequence. If you need parameters
 for your new bacteria,
 please contact Softberry Inc., we can include them
 in the WEB list.
 
 
 Algorithm based on pattern recognition of different
 types of signals
 and Markov chain models of coding regions. Optimal
 combination of these
 features is then found by dynamic programming and a
 set of gene models
 is constructed along given sequencea.
 
 -------------------------------------------------------

=====



Moderated
bionet.genome.gene-structure



__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com



More information about the Genstruc mailing list