GENEID - Online Prediction of Gene Structure

Kay Hofmann KHOFMANN at cipvax.biolan.Uni-Koeln.DE
Tue Dec 3 06:01:39 EST 1991


In <robison.691707287 at ribo> robison at ribo.harvard.edu writes:

> OK, the question which is probably on everyone's minds.  Since they sound 
> rather similar, how do GeneID and GRAIL stack up against each other?
> (or are they just two names for the same thing?)
> 
> Keith Robison
> Harvard University
> Program in Biochemistry, Molecular, Cellular, and Developmental Biology

Well, when i saw the two announcements (and before i started reading them
thoroughly) i also had the impression that GRAIL and GENEID do the same thing.

Both try to find exons in large stretches of DNA, but their approaches are
completely different. 

GRAIL takes, simply spoken, a look on the local base pair distribution 
and detects regions where these base pairs look exonlike. It uses a neural-
network approach and therefore doesn't really know what it is looking for,
but it recognizes an exon if it sees one. Or, at least, it is supposed to.
GRAIL only tells the user, at what regions of his sequence there might be
exons, but is does not care about defining the boundaries. It does detect
open reading frames, though.

GENEID, on the other hand, doesn't care about base composition and statistics,
but applies well-known rules how to find an exon. It looks for potential 
splicing sites and open reading frames. It also differentiates between
first exons (lacking a real start), last exons (lacking a real end) and 
internal exons. GENEID not only shows potential exons but tries to put them
together to form genes.

To compare both programs, it tried them with the only long and completely 
sequenced gene from our lab. It is 18kb long and contains 7 exons. The first
one is extremely short (4bp), the rest ranging between 80 and 200 bps. There is
one large intron of appr. 12kb and some shorter ones.

-GRAIL
found 2 'excellent' coding regions, representing two of the exons and one
'marginal' region representing the third one. It completely missed the 
other 4 exons although i have to admit that all those were slightly shorter
than 100bp, which is the minum exon-length  GRAIL claims to find.
GRAIL also found an excellent coding region on the reverse strand (in the 
promoter region of the 'real' gene. I took a closer look at this region, but
could not detect any exon/intron boundaries, so i think GRAIL is in error
about this one.

-GENEID
it took quite a time to get GENEID swallowing the sequence. I didn't know
that everything had to be in uppercase and that base pair ambiguity symbols
are not allowed.
but, finally, it did work and returned lots of potential exons with associated
scores. The actual first and last exon were among the 3 top scorers, 4 of the 
5 internal exons were distributed between rank 2 and 11. One of the exons was 
missing.
The suggestions for the gene had not much similarity to the actual gene, GENEID
found lots of 20bp-exons in the large intron.

Summarizing, i have the impression that both programs have their value 
if one has large amounts of genomic sequences or if one plans to look 
for cryptic exons. Of course, but that holds for all predictive software, one 
must not trust the results blindly, but, who would do that?
I would appreciate if there were a program that does both approaches at the
same time.

Best regards,
               Kay

------------------------------------------------------------------------
Kay Oliver Hofmann                        Tel. ++49 201 478 6980
Institut fuer Biochemie (med. Fak.)       FAX  ++49 201 478 6979
Universitaet Koeln
Joseph Stelzmann Str. 52            INTERNET:
D-5000 Koeln 41                     KHOFMANN at cipvax.biolan.uni-koeln.de
------------------------------------------------------------------------




More information about the Bio-soft mailing list