FGENESH+ New GeneFinder with Similarity usage

Victor Solovyev solovyev at sanger.ac.uk
Mon Jul 26 16:19:14 EST 1999


           We installed New gene-finding HMM based program FGENESH+
 for multiple gene prediction in  genomic DNA with using information
                        from similar protein
	             http://genomic.sanger.ac.uk/
   The Web version of FGENESH+ is prepared to analyse Human, Drosophila,
Nematode and Plant sequences (and other close organisms genes).

The program can be used if you know the protein sequence similar with
protein which is encoded by the gene in your sequence.
You should run first any ab initio gene finding program as FGENES or FGENESH.
Then run BLASTP DB search with each predicted exons. Any true of
predicted exons can provide you by knowledge of known similar protein
(if such protein exist in the DB). Take this protein and run Fgenesh+.
The accuracy of gene prediction will be up to 100% depending of how similar the
predicted and DB protein.

Ab initio gene prediction programs usually predict correctly
significant portion of exons in a gene, but they often do not predict
correctly the whole gene structure:combining several genes in one or
predict several genes instead of one, missing or overpredicting exons.
Using similarity information provided by one or several true predicted
exons we can significantly improve the accuracy of gene finding.

You should provide similarity value (knowing it from the Blast search),
it affects the prediction, because very low similarity
will permit your gene encode the protein which deviates more from
the known similar protein.

  TO USE Human specific version click (mark) Human button and fgenesh button
  TO USE other specific version click Drosophila or Nematode or Plant + fgenesh
button

Past your sequence to the first window or load your file with nucleotide
sequence in FASTA format

Past your protein sequence to the second window

     References: Salamov A.A., Solovyev V.V. (1999), unpublished data.
     Please reference: CGG WEB server:
     http://genomic.sanger.ac.uk/

     Fgenesh+ output:


      G - the number of predicted gene (from sequence start)
      Str -  DNA strand (+ and - for complementary)
      Feature - type of coding sequence (CDSf - First
                (Starting with Start codon);
                 CDSi - internal (internal exon);
                 CDSl - the last coding seagment,
                        finishing by stop codon)
      TSS - Position of transcription start (TATA-box position and score)

      Start and End - Position of the Feature
      Weight - Log likelihood*10 score for the feature
      ORF-start/end - positions where the complete codons start and end
      The last 3 values: Length of exon, positions in protein, % of similarity
with the target protein

          FGENESH+ Prediction of potential genes in Human      genomic DNA
          Time:   Mon Jul 26 21:38:41 1999
          Seq name: Adh_and_cact.1 (2919020 bases) 848501 853000 Protein -
gi|2313041|gnl|PID|d1022564 Length  215 Sim: 90
          Length of sequence:  4500  GC content: 40 Zone: 1
          Number of predicted genes 1 in +chain 1 in -chain 0
          Number of predicted exons 4 in +chain 4 in -chain 0
          Positions of predicted genes and exons:
           G Str Feature    Start     End   Score        ORF           Len

           1 +   1 CDSi    2577 -    2690    197.66    2579 -    2689    111
     1  -     35  100
           1 +   2 CDSi    2756 -    2936    312.35    2758 -    2934    177
    37  -     95  100
           1 +   3 CDSi    2991 -    3173    307.82    2992 -    3171    180
    97  -    156  100
           1 +   4 CDSl    3242 -    3419    301.90    3243 -    3419    177
   158  -    215  100

         Predicted protein(s):
         >FGENESH   1   4 exon (s)   2577  -   3419    217 aa, chain +
         PNMTAAPYNYNYIFKYIIIGDMGVGKSCLLHQFTEKKFMANCPHTIGVEFGTRIIEVDDK
         KIKLQIWDTAGQERFRAVTRSYYRGAAGALMVYDITRRSTYNHLSSWLTDTRNLTNPSTV
         IFLIGNKSDLESTREVTYEEAKEFADENGLMFLEASAMTGQNVEEAFLETARKIYQNIQE
         GRLDLNASESGVQHRPSQPSRTSLSSEATGAKDQCSC

-- 
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.uk  http://genomic.sanger.ac.uk
Phone: 44-1223-494799  FAX:   44-1223-494919




More information about the Bio-soft mailing list