Some comments about using The BCM Genefinder new service

Victor V. Solovyev solovyev at cmb.bcm.tmc.edu
Thu Aug 4 19:24:51 EST 1994


Some comments about using The BCM Genefinder new service:
============================================================================

> From: bayer at darwin.med.utah.edu (Steve Bayer)
> Message-Id: <9408042039.AA25499 at darwin.med.utah.edu>
> To: solovyev at cmb.bcm.tmc.edu
> Subject: Re: New gene identification service

On August 4, 2:39pm, Steve Bayer wrote:

> Hi.  I've tried your gene id servers with good results when comparing
> the results with GRAIL and our in-house software called XPound.  The
> programs you have made available are very impressive.  Thank you.
>
> However, I'm having trouble understanding the difference between the
> various servers.  The descriptions you posted make some of them (e.g.,
> FGENEH, FEXH, HEXON) appear to do the same thing.  Could you please
> send a little more information about how the various programs (i.e.,
> FGENEH, FEXH, HEXON, and HSPL) differ?

Some comments about using The BCM Genefinder new service:
============================================================================

      The Baylor College Of Medicine Computational Biology Group
                             Houston, TX

The services are 
FGENEH - search for gene structure with exons assembling by dynamic programming 
FEXH   - search for 5'-, internal and 3'-exons
HEXON  - search for internal exons 
HSPL   - search for splice sites
SSP    - prediction of a-helix and b-starnd segments in globular proteins.

The first four program are designed for Human gene analysis, but they are
still work good for Rodenta genes too.
  
The short descriptions of the programs have been sent to BIONEWS users one
week ago, and can be received from solovyev at cmb.bcm.tmc.edu.

The main differences of the programs:

1. FGENEH makes exon assembling and this is very important for true gene model
   construction. But in a case when one of the internal exons is not predicted
   (for example due to sequence errors in splice site conservative positions),
   the final gene structure may be significantly disturbed. Therefore, it will
   be useful to predict the potential exons without assembling (by FEXH program)
   and check the difference of exons prediction.

2. Sometimes FEXH predicts 5'- and 3'-exons which win in competition 
   with flanked internal exons. If you have a partially  sequenced gene 
   without 5'- or 3'-flanked region, you can compare results of FEXH and 
   HEXON and get a more accurate decision. 
   
3. Splice sites prediction will be more reliable if you will predict gene
   structure (FGENEH) or exons (FEXH or HEXON). But for special analysis
   (for example, search for alternative splicing variants) you may use
   splice site prediction itself (HSSP).

=================================================================================

REPEAT:

      The Baylor College Of Medicine Computational Biology Group
			     Houston, TX
		       announces a new service
			 The BCM Genefinder.

The services are FGENEH, FEXH, HEXON, HSPL, and SSP.

This message details FGENEH

NOTE: This service is temporarily being provided through the
University of Houston Gene-Server.  Only two jobs will be run at a
time.


				FGENEH
	 Prediction of gene structure in Human DNA sequences

Analysis of uncharacterized human sequences is available by sending the
file containing a sequence name and a sequence (no more than 80
char/line) to

	service at theory.bchs.uh.edu
with the subject line "FGENEH". 

Example: mail -s FGENEH service at theory.bchs.uh.edu < test.seq

where test.seq a file with the sequence.
 
Method description:
**********************
   Algorithm firstly predicts all possible potential internal exons, 
   and potential 5' and 3'-exon for each internal by linear discriminant
   functions combining characteristics describing various contextual
   features of these exons. Then by method of dynamic programming it 
   searches for optimal combination of these exons and construct gene model.
   
Accuracy:
************
Accuracy have been estimated for the set of 212 complete 
   human genes extracted from GenBank
   and compared with the accuracy of Grail-2 Email server for the same
   data set. It must be noted that these sequences are not independent from the
   "Fgeneh" and "Grail-2" training data.

 Test1 contains the nucleotide sequences from -150 bp before the first coding
   region and until +150 bp after the last coding region.
 Test2 contains  nucleotide sequences of whole GenBank entries.

               Test1:     Fgeneh     Grail-2      Test2:   Fgeneh    Grail-2

Exact exons               80%        39%	   	     73%       39%

Exon nucleotides          91%(0.88)  76%(0.74)             90%(0.82) 73%(0.75)


The numbers in () are the correlation coefficients.
		
For exon prediction in partially sequenced genes you can use "fexh"
(5'-, internal and 3'-exon prediction) and "hexon" (internal exon
prediction), see below.

Submitting sequences via email:
*******************************

  For email submission the sequences must have the following format:  

Name of your  sequence
ccatctctgtcttgcaggacaatgccgtcttctgtctcgtggggcatcctcctgctggca
ggcctgtgctgcctggtccctgtctccctggctgaggatccccagggagatgctgcccag
aagacagatacatcccaccatgatcaggatcacccaaccttcaacaagatcacccccaac
ctggctgagttcgccttcagcctataccgccagctggcacaccagtccaacagcaccaat
atcttcttctccccagtgagcatcg...............

   (Restrict the line length to 80 characters or less).

   You have to send the file containing the sequence to: 

   service at theory.bchs.uh.edu

   Subject line must be:
   FGENEH

   Example: mail -s FGENEH service at theory.bchs.uh.edu < test.seq

Fgeneh output:		
****************
   1st line - name of your sequence
   2nd line - length of your sequence
   3d line - number of potential exons
   4th line and next - positions of predicted exons 
   For example:
   HUMALPHA     4556 bp ds-DNA             PRI       15-SEP-1 
   length of sequence -   4556
   number of potential exon:  10
   380 -    516 
   611 -    727
   839 -    954 
  1147 -   1321
  1819 -   1953 
  2053 -   2125
  2254 -   2388
  2470 -   2661 
  2881 -   2997
  3120 -   3562 

Reference:

  1. Solovyev V.V.,Salamov A.A., Lawrence C.B.
   Predicting internal exons by oligonucleotide composition and 
   discriminant analysis of spliceable open reading frames. 
   (Nucl.Acids Res.,1994, in press).
  2. Solovyev V.V., Salamov A.A. , Lawrence C.B.
   The prediction of human exons by oligonucleotide composition and 
   discriminant analysis of spliceable open reading frames.
   in: The Second International conference on Intelligent systems
   for Molecular Biology (eds. Altman R., Brutlag D.,
   Karp R., Latrop R. and Searls D.), AAAI Press, Menlo Park, CA 
   (1994, in press) 
  3. Solovyev,V., Lawrence,C.B.
    Prediction of human gene structure using dynamic programming 
    and oligonucleotide composition. In: Abstracts of the 4th annual
    Keck symposium. Pittsburgh, 47,1993. 

Problems, comments, and suggestions:
   can be mailed to solovyev at cmb.bcm.tmc.edu.



More information about the Bionews mailing list