FGENESV - Finding Genes in genomes of RNA and DNA viruses

Victor victor at softberry.com
Thu Oct 24 06:08:07 EST 2002


      New    FGENESV - Finding Genes in genomes of RNA and DNA viruses

              program is available for on line usage at:

              http://www.softberry.com/berry.phtml?topic=gfindv

  Method description:

The FGENESV algorithm is based on pattern recognition of different types of
signals and Markov chain models of
coding regions. Optimal combination of these features is then found by
dynamic programming and a set of gene
models is constructed along given sequence.
FGENESV is the fastest ab initio viral gene prediction program available.
We have developed 2 variants of gene prediction: FGENESV0 (good to apply for
small genomes < 10000 bp) uses
generic parameters of coding regions and FGENESV learns genome-specific
parameters just from input viral genome
sequence.

FGENESV predicts all intron-less genes of viruses. However a few % of viral
genes contain intron sequences.
Such genes often are alternatives to the intron-less variant. Please use
standard eukaryotic gene finding
programs (such as FGENESH) additionally to FGENESV to find such genes.

As additional parameters you can choose Linear or Circular form of your
virus and select alternative genetic
code (Standard code is default): The Bacterial and Plant Plastid Code
(transl_table=11) or The Mold, Protozoan,
and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
(transl_table=4)

  FgenesV output:

 FGENESV: Prediction of potential genes in viral  genomes
 Time:   Tue Oct 22 16:17:25 2002
 Seq name: NC_001838 Common chimpanzee papillomavirus 1, complete genome.
 Length of sequence - 7889 bp
 Number of predicted genes - 8
     N   S             Start         End    Score

     1   +    CDS        101 -       559      693
     2   -    CDS        551 -       907      232
     3   +    CDS        840 -      2786     3253
     4   +    CDS       2728 -      3858      938
     5   +    CDS       3901 -      4185      298
     6   +    CDS       4195 -      4335      131
     7   +    CDS       4371 -      5759     2263
     8   +    CDS       5746 -      7251     1943
Predicted protein(s):
>GENE     1       101  -       559    152 aa, chain +
MESVNASTPAKTIDQLCKDCNLCMHSLQILCVFCKKTLSTAAAEVYSFEYKDLYIVWRGN
FPFAACAYCLELQGKVNQYRHFDYAAYAVTVEEETNKSIFDIRIRCYLCHKPLCAVEKVR
HILEKARFIKLNCEWKGRCFHCWTSCMENILP
>GENE     2       551  -       907    118 aa, chain -
MASTKNHPEHPVPSLSVPVSSAILLKFVEHTTGTLYSMSPAASCVGVECQVLYTPQPDAR
CYYSDHNWSLFGKLVGFVAWLAWLAWLVRPPHLLSCLIAHCNVDLQGQDSGVTQCPLR
>GENE     3       840  -      2786    648 aa, chain +
MADDTGTDNEGTGCSGWFLVEAIVDKTTGEQVSDDEDETVEDSGLDMVDFIDDRPITHNS
LEAQALLNEQEADAHYAAVQDLKRKYLGSPYVSPLGHIEQSVDCDISPRLDAIQLSRKPK
KVKRRLFQSREITDSGYGYSEVETATQVERYGEPENGCGGGGDGREKEGEGQVHTEVHTE
SEIEQHTGTTRVLELLKCKDVRATLHGKFKECYGLSFKDLTREFKSDKTTCGDWVVAGFG
VHHSVSEAFQKLIQPLSTYSHIQWLTNYKCMGMVLLVLLRFKVNKNRCTVARTLATLLNI
PEDHMLIEPPKIQSSVAALYWFRTSISNASIVTGDTPEWIARQTIVEHGLADNQFKLTEM
VQWAYDNDYCDESDIAFEYAQRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKM
SIKQWIKYRSNKIDETGNWKPIVQFLRHQGIEFISFLSKLKLWLHGTPKKNCIAIVGPPD
TGKSAFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWGYMDTYMRNLL
DGNPMSIDRKHKSLALIKCPPLLVTSNIDITTEERYKYLYSRVTLFKFPNPFPFDSNGNA
VYELCDANWKCFFARLSASLDIQDSEDEDDGDTSQAFRCVPGTVVRTV
>GENE     4      2728  -      3858    376 aa, chain +
METLAKHLDACQEQLLELYEENSNELKKHIQHWKCVRYENVLLHKARQMGISHIGPQVVP
PLQVSQTKGHEAIEMQMRIETLLKSQFGMEPWTLQDTSFEMWLTPPKHCFKKQGKTVEVK
YDCNAENTMHYVLWKYIYVYNTEKEIWLKVKGMVDYKGLYYMMEQCKTYYVDFEKEAKQY
GKTLQWEVCFDSTVICSPASVSSTVQEVSNAGPTSYSTTLAQATYTVPSSVSEECVQAPP
SKRQRGPSQSAGKTQHTCNIVCDTDCATLDSANNNINNNSYSSNNGRNNSYCTGTPIVQL
QGDSNNLKCFRYRLHSNYKHLFFACISTWHWTASSNSPKTAIVTLTYVNEQQRQEFLNTV
KIPGTITHKLGFVAIM
>GENE     5      3901  -      4185     94 aa, chain +
MELQVVPVDVTTTTTNASLLPLLIALTVCLISIILLVFVSEFVIYSSVLVLTLLIYLLLW
LLLTTHLQFYLLTLSLCFIPAFSVHQYILQTQQL
>GENE     6      4195  -      4335     46 aa, chain +
MLTCSFDDGDTWLLLWLLASLIVAILGLLLLYLKAVHIHSHSCCSK
>GENE     7      4371  -      5759    462 aa, chain +
MAHSRPRRRKRASATQLYQTCKASGTCPDIIPKVEQNTLADKILKWGSLGVFFGGLGIGT
GSGTGGRTGYVPLESAPRPAIPFGPTARPPIVVDTVGPTDSSIVSLVEDSAIINSGASDL
VPSIHGGFEISTSESTTPAILDVSITTHNTTSTSIFRNPAFAEPSIVQSQPSVEAGGHLL
TSTFTSTISPHSVEEIPLDTFIVSSSNSNPASSTPVPTTVARPRLGLYSKALHQVQVTDP
AFLSSPQRLITFDNPVYEGEDISLHFEHNSIHEPPNEAFMDIIRLHRPAITSRRGVVRFS
RIGQRGSMYTRSGKHIGGRVHFFTDISPISADAQDIELQPLVAAAQDDSDLFDIYVDPDT
TPVAVDNIPSANSTLFIKSSIFDTSWGNTTIPLSLPNNIFVQPGPDILFPTTPAVPPYGP
VISPLPVGPVFISGSEFYLHPSLYFARKRRKRVSLFFSDVAA
>GENE     8      5746  -      7251    501 aa, chain +
MWRPSDNKLYVPPPAPVSKVLTTDAYVTRTKIFYHASSSRLLAVGNPYFPIRKANKTIVP
KVSGFQFRVFKIVLPDPNKFALPDTSIFDSTSQRLVWACIGLEVGRGQPLGVGYCGHPCL
NKFDDVENSASYAVNPGQDNRVNVAMDYKQTQLCLVGCAPPLGEHWGKGKQCSGVSVQDG
DCPPLELVTSVIQDGDMVDTGFGAMDFAELQSNKSDVPLDICTSTCKYPDYLQMAADPYG
DRLFFYLRKEQMFARHFFNRAGTVGEQIPDELFVKGTTSRATVSSNIYFNTPSGSLVSSE
AQLFNKPYWLHKAQGHNNGICWGNTLFVTVVDTTRSTNMTVCASTTSSPSATYTASEYKQ
YMRHVEEFDLQFIFQLCTIKLTAELMAYIHTMNPTVLEEWNFGLSPPPNGTLEDTYRYVQ
SQAITCQKPTPDKEKQDPYAGLSFWEVNLKEKFSSELEQYPLGRKFLLQTGVQSTSLARA
GTKRAASTSTATPTRKKVKRK

---




More information about the Bio-www mailing list