Finding Operons and genes in Bacillus anthracis A2012

Victor victor at softberry.com
Wed Sep 4 02:30:40 EST 2002


  FGENESB - Finding operons and genes in microbial genomes

In the current FgenesB version simple operon prediction model is realized
based on gene distances. It can recognize accurately 70% of single
transcription
units and define exactly about 43% of operons (~92% partially). Increasing
accuracy of operon identification using promoter, terminator
and other features is under development.

We have developed new fgenesB-annotator script that find similar protein
in public databases and annotates predicted genes. This script also can
discover additional low scoring genes if they have knowm homologous protein.

Example of annotation of Bacillus anthracis A2012 main chromosome see at


http://www.softberry.com/bact/ba.ann

 FgenesB-annotator:  Finding operons and genes in microbial genomes
(Softberry Inc.)
Seq name: gi|20520073|gb|AAAC01000001.1| Bacillus anthracis A2012 main
chromosome, whole genome shotgun sequence
 Length of sequence - 5093554 bp  Parameters: Bacillus anthracis
 Number of predicted genes - 5917, with homology - 5480
 Number of transcription units - 3568, operons - 1224
     N      Tu/Op    S             Start         End    Score

     1     1 Op  1   +    CDS        273 -       953    692 ## MgtC, MgtC
family [Bacillus anthracis A2012] [Bacillus anthr
     2     1 Op  2   +    CDS       1049 -      2044    625 ##
Similar_to_GB_hypothetical
     3     2 Tu  1   -    CDS       2031 -      2444    461 ##
Similar_to_GB_hypothetical
     4     3 Tu  1   -    CDS       2552 -      3904   1599 ## PGI,
Phosphoglucose isomerase [Bacillus anthracis A2012] [Ba
     5     4 Tu  1   +    CDS       4179 -      4412    393 ##
Similar_to_GB_hypothetical
     6     5 Tu  1   -    CDS       4525 -      4869    470 ## S1, Ribosomal
protein S1-like RNA-binding domain [Bacillus a
     7     6 Op  1   -    CDS       5122 -      6312   1010 ##
aminotran_1_2, Aminotransferase class I and II [Bacillus ant
     8     6 Op  2   -    CDS       6309 -      6806    639 ##
ASNC_trans_reg, AsnC family [Bacillus anthracis A2012] [Baci
     9     7 Tu  1   +    CDS       6954 -      7916   1144 ## 2-Hacid_DH_C,
D-isomer specific 2-hydroxyacid dehydrogenase,
    10     8 Tu  1   +    CDS       8026 -      8865    644 ## abhydrolase,
alpha/beta hydrolase fold [Bacillus anthracis A
    11     9 Tu  1   -    CDS       8895 -      9146    292 ##
Similar_to_GB_hypothetical
    12    10 Tu  1   +    CDS       9264 -     10415    886 ##
aminotran_1_2, Aminotransferase class I and II [Bacillus ant
    13    11 Tu  1   -    CDS      10600 -     11097    539 ## sodcu,
Copper/zinc superoxide dismutase (SODC) [Bacillus ant
    14    12 Tu  1   -    CDS      11208 -     11384    264 ##
Similar_to_GB_hypothetical
    15    13 Tu  1   +    CDS      11550 -     11933    526 ##
Similar_to_GB_hypothetical
    16    14 Tu  1   -    CDS      11975 -     12598    605 ## EXOIII,
exonuclease domain in DNA-polymerase alpha and epsil
    17    15 Tu  1   +    CDS      12888 -     14213   1615 ## ArsB,
Arsenical pump membrane protein [Bacillus anthracis A2
    18    16 Tu  1   -    CDS      14272 -     14739    418 ##
Similar_to_GB_hypothetical
    19    17 Tu  1   +    CDS      14858 -     15571    661 ##
Similar_to_GB_hypothetical
    20    18 Tu  1   +    CDS      15919 -     17295   1497 ##
HGTP_anticodon, Anticodon binding domain [Bacillus anthracis
    21    19 Tu  1   -    CDS      17333 -     17716    496 ## DUF157,
Uncharacterized protein PaaI, COG2050 [Bacillus anth
    22    20 Op  1   +    CDS      17812 -     18555    500 ##
Similar_to_GB_hypothetical
    23    20 Op  2   +    CDS      18606 -     19199    756 ## BioY, BioY
family [Bacillus anthracis A2012] [Bacillus anthr



New FgenesB is the fastest (E.coli genome analyzed in ~14 sec) and most
accurate ab initio Bacterial gene prediction program available.

	http://www.softberry.com/berry.phtml?topic=fgenesb

It uses parameters learned for different bacteria by FgenesB-train script,
which input is just new bacterial sequence. It will automatically create
file with gene prediction parameters for the analyzed organism.
It takes only ~10 minutes to create such file for such genome as
E.coli using its sequence. If you need parameters for your new bacteria,
please contact Softberry Inc., we can include them in the WEB list.


Algorithm based on pattern recognition of different types of signals
and Markov chain models of coding regions. Optimal combination of these
features is then found by dynamic programming and a set of gene models
is constructed along given sequencea.

----------------------------------------------------------------------------
----

---



More information about the Microbio mailing list