Finding Operons and genes in Bacillus
Victor
victor at softberry.com
Wed Sep 4 02:37:11 EST 2002
FGENESB - Finding operons and genes in microbial
genomes
In the current FgenesB version simple operon
prediction model is realized
based on gene distances. It can recognize accurately
70% of single
transcription
units and define exactly about 43% of operons (~92%
partially). Increasing
accuracy of operon identification using promoter,
terminator
and other features is under development.
We have developed new fgenesB-annotator script that
find similar protein
in public databases and annotates predicted genes.
This script also can
discover additional low scoring genes if they have
knowm homologous protein.
Example of annotation of Bacillus anthracis A2012
main chromosome see at
http://www.softberry.com/bact/ba.ann
FgenesB-annotator: Finding operons and genes in
microbial genomes
(Softberry Inc.)
Seq name: gi|20520073|gb|AAAC01000001.1| Bacillus
anthracis A2012 main
chromosome, whole genome shotgun sequence
Length of sequence - 5093554 bp Parameters:
Bacillus anthracis
Number of predicted genes - 5917, with homology -
5480
Number of transcription units - 3568, operons -
1224
N Tu/Op S Start End
Score
1 1 Op 1 + CDS 273 - 953
692 ## MgtC, MgtC
family [Bacillus anthracis A2012] [Bacillus anthr
2 1 Op 2 + CDS 1049 - 2044
625 ##
Similar_to_GB_hypothetical
3 2 Tu 1 - CDS 2031 - 2444
461 ##
Similar_to_GB_hypothetical
4 3 Tu 1 - CDS 2552 - 3904
1599 ## PGI,
Phosphoglucose isomerase [Bacillus anthracis A2012]
[Ba
5 4 Tu 1 + CDS 4179 - 4412
393 ##
Similar_to_GB_hypothetical
6 5 Tu 1 - CDS 4525 - 4869
470 ## S1, Ribosomal
protein S1-like RNA-binding domain [Bacillus a
7 6 Op 1 - CDS 5122 - 6312
1010 ##
aminotran_1_2, Aminotransferase class I and II
[Bacillus ant
8 6 Op 2 - CDS 6309 - 6806
639 ##
ASNC_trans_reg, AsnC family [Bacillus anthracis
A2012] [Baci
9 7 Tu 1 + CDS 6954 - 7916
1144 ## 2-Hacid_DH_C,
D-isomer specific 2-hydroxyacid dehydrogenase,
10 8 Tu 1 + CDS 8026 - 8865
644 ## abhydrolase,
alpha/beta hydrolase fold [Bacillus anthracis A
11 9 Tu 1 - CDS 8895 - 9146
292 ##
Similar_to_GB_hypothetical
12 10 Tu 1 + CDS 9264 - 10415
886 ##
aminotran_1_2, Aminotransferase class I and II
[Bacillus ant
13 11 Tu 1 - CDS 10600 - 11097
539 ## sodcu,
Copper/zinc superoxide dismutase (SODC) [Bacillus
ant
14 12 Tu 1 - CDS 11208 - 11384
264 ##
Similar_to_GB_hypothetical
15 13 Tu 1 + CDS 11550 - 11933
526 ##
Similar_to_GB_hypothetical
> 16 14 Tu 1 - CDS 11975 - 12598
> 605 ## EXOIII,
> exonuclease domain in DNA-polymerase alpha and epsil
> 17 15 Tu 1 + CDS 12888 - 14213
> 1615 ## ArsB,
> Arsenical pump membrane protein [Bacillus anthracis
> A2
> 18 16 Tu 1 - CDS 14272 - 14739
> 418 ##
> Similar_to_GB_hypothetical
> 19 17 Tu 1 + CDS 14858 - 15571
> 661 ##
> Similar_to_GB_hypothetical
> 20 18 Tu 1 + CDS 15919 - 17295
> 1497 ##
> HGTP_anticodon, Anticodon binding domain [Bacillus
> anthracis
> 21 19 Tu 1 - CDS 17333 - 17716
> 496 ## DUF157,
> Uncharacterized protein PaaI, COG2050 [Bacillus anth
> 22 20 Op 1 + CDS 17812 - 18555
> 500 ##
> Similar_to_GB_hypothetical
> 23 20 Op 2 + CDS 18606 - 19199
> 756 ## BioY, BioY
family [Bacillus anthracis A2012] [Bacillus anthr
New FgenesB is the fastest (E.coli genome analyzed
in ~14 sec) and most
accurate ab initio Bacterial gene prediction program
available.
http://ww.softberry.com/berry.phtml?topic=fgenesb
It uses parameters learned for different bacteria by>
FgenesB-train script,
which input is just new bacterial sequence. It will
automatically create
file with gene prediction parameters for the
analyzed organism.
It takes only ~10 minutes to create such file for
such genome as
E.coli using its sequence. If you need parameters
for your new bacteria,
please contact Softberry Inc., we can include them
in the WEB list.
Algorithm based on pattern recognition of different
types of signals
and Markov chain models of coding regions. Optimal
combination of these
features is then found by dynamic programming and a
set of gene models
is constructed along given sequencea.
-------------------------------------------------------
=====
Moderated
bionet.genome.gene-structure
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
More information about the Genstruc
mailing list