FGENESV - Finding Genes in genomes of RNA

Victor victor at softberry.com
Wed Oct 23 02:12:52 EST 2002


               program is available for on line usage
 at:
 
              
 http://www.softberry.com/berry.phtml?topic=gfindv
 
   Method description:
 
 The FGENESV algorithm is based on pattern
 recognition of different types of
 signals and Markov chain models of
 coding regions. Optimal combination of these
 features is then found by
 dynamic programming and a set of gene
 models is constructed along given sequence.
 FGENESV is the fastest ab initio viral gene
 prediction program available.
 We have developed 2 variants of gene prediction:
 FGENESV0 (good to apply for
 small genomes < 10000 bp) uses
 generic parameters of coding regions and FGENESV
 learns genome-specific
 parameters just from input viral genome
 sequence.
 
 FGENESV predicts all intron-less genes of viruses.
 However a few % of viral
 genes contain intron sequences.
 Such genes often are alternatives to the intron-less
 variant. Please use
 standard eukaryotic gene finding
 programs (such as FGENESH) additionally to FGENESV
 to find such genes.
 
 As additional parameters you can choose Linear or
 Circular form of your
 virus and select alternative genetic
 code (Standard code is default): The Bacterial and
 Plant Plastid Code
 (transl_table=11) or The Mold, Protozoan,
 and Coelenterate Mitochondrial Code and the
 Mycoplasma/Spiroplasma Code
 (transl_table=4)
 
   FgenesV output:
 
  FGENESV: Prediction of potential genes in viral 
 genomes
  Time:   Tue Oct 22 16:17:25 2002
  Seq name: NC_001838 Common chimpanzee
 papillomavirus 1, complete genome.
  Length of sequence - 7889 bp
  Number of predicted genes - 8
      N   S             Start         End    Score
 
      1   +    CDS        101 -       559      693
      2   -    CDS        551 -       907      232
      3   +    CDS        840 -      2786     3253
      4   +    CDS       2728 -      3858      938
      5   +    CDS       3901 -      4185      298
      6   +    CDS       4195 -      4335      131
      7   +    CDS       4371 -      5759     2263
      8   +    CDS       5746 -      7251     1943
 Predicted protein(s):
 >GENE     1       101  -       559    152 aa, chain
 +

ESVNASTPAKTIDQLCKDCNLCMHSLQILCVFCKKTLSTAAAEVYSFEYKDLYIVWRGN

PFAACAYCLELQGKVNQYRHFDYAAYAVTVEEETNKSIFDIRIRCYLCHKPLCAVEKVR
 HILEKARFIKLNCEWKGRCFHCWTSCMENILP
GENE     2       551  -       907    118 aa, chain
 -

STKNHPEHPVPSLSVPVSSAILLKFVEHTTGTLYSMSPAASCVGVECQVLYTPQPDAR

CYYSDHNWSLFGKLVGFVAWLAWLAWLVRPPHLLSCLIAHCNVDLQGQDSGVTQCPLR
GENE     3       840  -      2786    648 aa, chain
 +

MADDTGTDNEGTGCSGWFLVEAIVDKTTGEQVSDDEDETVEDSGLDMVDFIDDRPITHNS

LEAQALLNEQEADAHYAAVQDLKRKYLGSPYVSPLGHIEQSVDCDISPRLDAIQLSRKPK
>
KVKRRLFQSREITDSGYGYSEVETATQVERYGEPENGCGGGGDGREKEGEGQVHTEVHTE
>
SEIEQHTGTTRVLELLKCKDVRATLHGKFKECYGLSFKDLTREFKSDKTTCGDWVVAGFG
>
VHHSVSEAFQKLIQPLSTYSHIQWLTNYKCMGMVLLVLLRFKVNKNRCTVARTLATLLNI
>
PEDHMLIEPPKIQSSVAALYWFRTSISNASIVTGDTPEWIARQTIVEHGLADNQFKLTEM
>
VQWAYDNDYCDESDIAFEYAQRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKM
>
SIKQWIKYRSNKIDETGNWKPIVQFLRHQGIEFISFLSKLKLWLHGTPKKNCIAIVGPPD
>
TGKSAFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWGYMDTYMRNLL
>
DGNPMSIDRKHKSLALIKCPPLLVTSNIDITTEERYKYLYSRVTLFKFPNPFPFDSNGNA
> VYELCDANWKCFFARLSASLDIQDSEDEDDGDTSQAFRCVPGTVVRTV
> >GENE     4      2728  -      3858    376 aa, chain
> +
>
METLAKHLDACQEQLLELYEENSNELKKHIQHWKCVRYENVLLHKARQMGISHIGPQVVP
>
PLQVSQTKGHEAIEMQMRIETLLKSQFGMEPWTLQDTSFEMWLTPPKHCFKKQGKTVEVK
>
YDCNAENTMHYVLWKYIYVYNTEKEIWLKVKGMVDYKGLYYMMEQCKTYYVDFEKEAKQY
>
GKTLQWEVCFDSTVICSPASVSSTVQEVSNAGPTSYSTTLAQATYTVPSSVSEECVQAPP
>
SKRQRGPSQSAGKTQHTCNIVCDTDCATLDSANNNINNNSYSSNNGRNNSYCTGTPIVQL
>
QGDSNNLKCFRYRLHSNYKHLFFACISTWHWTASSNSPKTAIVTLTYVNEQQRQEFLNTV
> KIPGTITHKLGFVAIM
> >GENE     5      3901  -      4185     94 aa, chain
> +
>
MELQVVPVDVTTTTTNASLLPLLIALTVCLISIILLVFVSEFVIYSSVLVLTLLIYLLLW
> LLLTTHLQFYLLTLSLCFIPAFSVHQYILQTQQL
> >GENE     6      4195  -      4335     46 aa, chain
> +
> MLTCSFDDGDTWLLLWLLASLIVAILGLLLLYLKAVHIHSHSCCSK
> >GENE     7      4371  -      5759    462 aa, chain
> +
>
MAHSRPRRRKRASATQLYQTCKASGTCPDIIPKVEQNTLADKILKWGSLGVFFGGLGIGT
>
GSGTGGRTGYVPLESAPRPAIPFGPTARPPIVVDTVGPTDSSIVSLVEDSAIINSGASDL
>
VPSIHGGFEISTSESTTPAILDVSITTHNTTSTSIFRNPAFAEPSIVQSQPSVEAGGHLL
>
TSTFTSTISPHSVEEIPLDTFIVSSSNSNPASSTPVPTTVARPRLGLYSKALHQVQVTDP
>
AFLSSPQRLITFDNPVYEGEDISLHFEHNSIHEPPNEAFMDIIRLHRPAITSRRGVVRFS
>
RIGQRGSMYTRSGKHIGGRVHFFTDISPISADAQDIELQPLVAAAQDDSDLFDIYVDPDT
>
TPVAVDNIPSANSTLFIKSSIFDTSWGNTTIPLSLPNNIFVQPGPDILFPTTPAVPPYGP
> VISPLPVGPVFISGSEFYLHPSLYFARKRRKRVSLFFSDVAA
> >GENE     8      5746  -      7251    501 aa, chain
> +
>
MWRPSDNKLYVPPPAPVSKVLTTDAYVTRTKIFYHASSSRLLAVGNPYFPIRKANKTIVP
>
KVSGFQFRVFKIVLPDPNKFALPDTSIFDSTSQRLVWACIGLEVGRGQPLGVGYCGHPCL
>
NKFDDVENSASYAVNPGQDNRVNVAMDYKQTQLCLVGCAPPLGEHWGKGKQCSGVSVQDG
>
DCPPLELVTSVIQDGDMVDTGFGAMDFAELQSNKSDVPLDICTSTCKYPDYLQMAADPYG
>
DRLFFYLRKEQMFARHFFNRAGTVGEQIPDELFVKGTTSRATVSSNIYFNTPSGSLVSSE
>
AQLFNKPYWLHKAQGHNNGICWGNTLFVTVVDTTRSTNMTVCASTTSSPSATYTASEYKQ
>
YMRHVEEFDLQFIFQLCTIKLTAELMAYIHTMNPTVLEEWNFGLSPPPNGTLEDTYRYVQ
>
SQAITCQKPTPDKEKQDPYAGLSFWEVNLKEKFSSELEQYPLGRKFLLQTGVQSTSLARA
> GTKRAASTSTATPTRKKVKRK
> 
> --- 

=====



Moderated
bionet.genome.gene-structure



__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com



More information about the Genstruc mailing list