FGENESV - Finding Genes in genomes of RNA
Victor
victor at softberry.com
Wed Oct 23 02:12:52 EST 2002
program is available for on line usage
at:
http://www.softberry.com/berry.phtml?topic=gfindv
Method description:
The FGENESV algorithm is based on pattern
recognition of different types of
signals and Markov chain models of
coding regions. Optimal combination of these
features is then found by
dynamic programming and a set of gene
models is constructed along given sequence.
FGENESV is the fastest ab initio viral gene
prediction program available.
We have developed 2 variants of gene prediction:
FGENESV0 (good to apply for
small genomes < 10000 bp) uses
generic parameters of coding regions and FGENESV
learns genome-specific
parameters just from input viral genome
sequence.
FGENESV predicts all intron-less genes of viruses.
However a few % of viral
genes contain intron sequences.
Such genes often are alternatives to the intron-less
variant. Please use
standard eukaryotic gene finding
programs (such as FGENESH) additionally to FGENESV
to find such genes.
As additional parameters you can choose Linear or
Circular form of your
virus and select alternative genetic
code (Standard code is default): The Bacterial and
Plant Plastid Code
(transl_table=11) or The Mold, Protozoan,
and Coelenterate Mitochondrial Code and the
Mycoplasma/Spiroplasma Code
(transl_table=4)
FgenesV output:
FGENESV: Prediction of potential genes in viral
genomes
Time: Tue Oct 22 16:17:25 2002
Seq name: NC_001838 Common chimpanzee
papillomavirus 1, complete genome.
Length of sequence - 7889 bp
Number of predicted genes - 8
N S Start End Score
1 + CDS 101 - 559 693
2 - CDS 551 - 907 232
3 + CDS 840 - 2786 3253
4 + CDS 2728 - 3858 938
5 + CDS 3901 - 4185 298
6 + CDS 4195 - 4335 131
7 + CDS 4371 - 5759 2263
8 + CDS 5746 - 7251 1943
Predicted protein(s):
>GENE 1 101 - 559 152 aa, chain
+
ESVNASTPAKTIDQLCKDCNLCMHSLQILCVFCKKTLSTAAAEVYSFEYKDLYIVWRGN
PFAACAYCLELQGKVNQYRHFDYAAYAVTVEEETNKSIFDIRIRCYLCHKPLCAVEKVR
HILEKARFIKLNCEWKGRCFHCWTSCMENILP
GENE 2 551 - 907 118 aa, chain
-
STKNHPEHPVPSLSVPVSSAILLKFVEHTTGTLYSMSPAASCVGVECQVLYTPQPDAR
CYYSDHNWSLFGKLVGFVAWLAWLAWLVRPPHLLSCLIAHCNVDLQGQDSGVTQCPLR
GENE 3 840 - 2786 648 aa, chain
+
MADDTGTDNEGTGCSGWFLVEAIVDKTTGEQVSDDEDETVEDSGLDMVDFIDDRPITHNS
LEAQALLNEQEADAHYAAVQDLKRKYLGSPYVSPLGHIEQSVDCDISPRLDAIQLSRKPK
>
KVKRRLFQSREITDSGYGYSEVETATQVERYGEPENGCGGGGDGREKEGEGQVHTEVHTE
>
SEIEQHTGTTRVLELLKCKDVRATLHGKFKECYGLSFKDLTREFKSDKTTCGDWVVAGFG
>
VHHSVSEAFQKLIQPLSTYSHIQWLTNYKCMGMVLLVLLRFKVNKNRCTVARTLATLLNI
>
PEDHMLIEPPKIQSSVAALYWFRTSISNASIVTGDTPEWIARQTIVEHGLADNQFKLTEM
>
VQWAYDNDYCDESDIAFEYAQRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKM
>
SIKQWIKYRSNKIDETGNWKPIVQFLRHQGIEFISFLSKLKLWLHGTPKKNCIAIVGPPD
>
TGKSAFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWGYMDTYMRNLL
>
DGNPMSIDRKHKSLALIKCPPLLVTSNIDITTEERYKYLYSRVTLFKFPNPFPFDSNGNA
> VYELCDANWKCFFARLSASLDIQDSEDEDDGDTSQAFRCVPGTVVRTV
> >GENE 4 2728 - 3858 376 aa, chain
> +
>
METLAKHLDACQEQLLELYEENSNELKKHIQHWKCVRYENVLLHKARQMGISHIGPQVVP
>
PLQVSQTKGHEAIEMQMRIETLLKSQFGMEPWTLQDTSFEMWLTPPKHCFKKQGKTVEVK
>
YDCNAENTMHYVLWKYIYVYNTEKEIWLKVKGMVDYKGLYYMMEQCKTYYVDFEKEAKQY
>
GKTLQWEVCFDSTVICSPASVSSTVQEVSNAGPTSYSTTLAQATYTVPSSVSEECVQAPP
>
SKRQRGPSQSAGKTQHTCNIVCDTDCATLDSANNNINNNSYSSNNGRNNSYCTGTPIVQL
>
QGDSNNLKCFRYRLHSNYKHLFFACISTWHWTASSNSPKTAIVTLTYVNEQQRQEFLNTV
> KIPGTITHKLGFVAIM
> >GENE 5 3901 - 4185 94 aa, chain
> +
>
MELQVVPVDVTTTTTNASLLPLLIALTVCLISIILLVFVSEFVIYSSVLVLTLLIYLLLW
> LLLTTHLQFYLLTLSLCFIPAFSVHQYILQTQQL
> >GENE 6 4195 - 4335 46 aa, chain
> +
> MLTCSFDDGDTWLLLWLLASLIVAILGLLLLYLKAVHIHSHSCCSK
> >GENE 7 4371 - 5759 462 aa, chain
> +
>
MAHSRPRRRKRASATQLYQTCKASGTCPDIIPKVEQNTLADKILKWGSLGVFFGGLGIGT
>
GSGTGGRTGYVPLESAPRPAIPFGPTARPPIVVDTVGPTDSSIVSLVEDSAIINSGASDL
>
VPSIHGGFEISTSESTTPAILDVSITTHNTTSTSIFRNPAFAEPSIVQSQPSVEAGGHLL
>
TSTFTSTISPHSVEEIPLDTFIVSSSNSNPASSTPVPTTVARPRLGLYSKALHQVQVTDP
>
AFLSSPQRLITFDNPVYEGEDISLHFEHNSIHEPPNEAFMDIIRLHRPAITSRRGVVRFS
>
RIGQRGSMYTRSGKHIGGRVHFFTDISPISADAQDIELQPLVAAAQDDSDLFDIYVDPDT
>
TPVAVDNIPSANSTLFIKSSIFDTSWGNTTIPLSLPNNIFVQPGPDILFPTTPAVPPYGP
> VISPLPVGPVFISGSEFYLHPSLYFARKRRKRVSLFFSDVAA
> >GENE 8 5746 - 7251 501 aa, chain
> +
>
MWRPSDNKLYVPPPAPVSKVLTTDAYVTRTKIFYHASSSRLLAVGNPYFPIRKANKTIVP
>
KVSGFQFRVFKIVLPDPNKFALPDTSIFDSTSQRLVWACIGLEVGRGQPLGVGYCGHPCL
>
NKFDDVENSASYAVNPGQDNRVNVAMDYKQTQLCLVGCAPPLGEHWGKGKQCSGVSVQDG
>
DCPPLELVTSVIQDGDMVDTGFGAMDFAELQSNKSDVPLDICTSTCKYPDYLQMAADPYG
>
DRLFFYLRKEQMFARHFFNRAGTVGEQIPDELFVKGTTSRATVSSNIYFNTPSGSLVSSE
>
AQLFNKPYWLHKAQGHNNGICWGNTLFVTVVDTTRSTNMTVCASTTSSPSATYTASEYKQ
>
YMRHVEEFDLQFIFQLCTIKLTAELMAYIHTMNPTVLEEWNFGLSPPPNGTLEDTYRYVQ
>
SQAITCQKPTPDKEKQDPYAGLSFWEVNLKEKFSSELEQYPLGRKFLLQTGVQSTSLARA
> GTKRAASTSTATPTRKKVKRK
>
> ---
=====
Moderated
bionet.genome.gene-structure
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
More information about the Genstruc
mailing list