GENEID - Online Prediction of Gene Structure
Kay Hofmann
KHOFMANN at cipvax.biolan.Uni-Koeln.DE
Tue Dec 3 06:01:39 EST 1991
In <robison.691707287 at ribo> robison at ribo.harvard.edu writes:
> OK, the question which is probably on everyone's minds. Since they sound
> rather similar, how do GeneID and GRAIL stack up against each other?
> (or are they just two names for the same thing?)
>
> Keith Robison
> Harvard University
> Program in Biochemistry, Molecular, Cellular, and Developmental Biology
Well, when i saw the two announcements (and before i started reading them
thoroughly) i also had the impression that GRAIL and GENEID do the same thing.
Both try to find exons in large stretches of DNA, but their approaches are
completely different.
GRAIL takes, simply spoken, a look on the local base pair distribution
and detects regions where these base pairs look exonlike. It uses a neural-
network approach and therefore doesn't really know what it is looking for,
but it recognizes an exon if it sees one. Or, at least, it is supposed to.
GRAIL only tells the user, at what regions of his sequence there might be
exons, but is does not care about defining the boundaries. It does detect
open reading frames, though.
GENEID, on the other hand, doesn't care about base composition and statistics,
but applies well-known rules how to find an exon. It looks for potential
splicing sites and open reading frames. It also differentiates between
first exons (lacking a real start), last exons (lacking a real end) and
internal exons. GENEID not only shows potential exons but tries to put them
together to form genes.
To compare both programs, it tried them with the only long and completely
sequenced gene from our lab. It is 18kb long and contains 7 exons. The first
one is extremely short (4bp), the rest ranging between 80 and 200 bps. There is
one large intron of appr. 12kb and some shorter ones.
-GRAIL
found 2 'excellent' coding regions, representing two of the exons and one
'marginal' region representing the third one. It completely missed the
other 4 exons although i have to admit that all those were slightly shorter
than 100bp, which is the minum exon-length GRAIL claims to find.
GRAIL also found an excellent coding region on the reverse strand (in the
promoter region of the 'real' gene. I took a closer look at this region, but
could not detect any exon/intron boundaries, so i think GRAIL is in error
about this one.
-GENEID
it took quite a time to get GENEID swallowing the sequence. I didn't know
that everything had to be in uppercase and that base pair ambiguity symbols
are not allowed.
but, finally, it did work and returned lots of potential exons with associated
scores. The actual first and last exon were among the 3 top scorers, 4 of the
5 internal exons were distributed between rank 2 and 11. One of the exons was
missing.
The suggestions for the gene had not much similarity to the actual gene, GENEID
found lots of 20bp-exons in the large intron.
Summarizing, i have the impression that both programs have their value
if one has large amounts of genomic sequences or if one plans to look
for cryptic exons. Of course, but that holds for all predictive software, one
must not trust the results blindly, but, who would do that?
I would appreciate if there were a program that does both approaches at the
same time.
Best regards,
Kay
------------------------------------------------------------------------
Kay Oliver Hofmann Tel. ++49 201 478 6980
Institut fuer Biochemie (med. Fak.) FAX ++49 201 478 6979
Universitaet Koeln
Joseph Stelzmann Str. 52 INTERNET:
D-5000 Koeln 41 KHOFMANN at cipvax.biolan.uni-koeln.de
------------------------------------------------------------------------
More information about the Bio-soft
mailing list