I have been working on a program that identifies coding sequences
from genomic DNA. I have also been studying the performance of
a number of the more recent programs/papers on the topic. I have
been struck by the singular lack of consistency in the way the
performance of these systems is evaluated. While some of this
inconsistency is due to differences in the limitations or intent
of the different programs, I think it would be helpful to every-
one to develop a set of standard performance measures by which to
judge the success of each new method.
I would like to see the following data:
Results in terms of complete sequences:
Exons (in)correctly predicted (both donor and acceptor sites correct)
Exons partially predicted (one donor or acceptor site correct)
Exons partially predicted (prediction overlaps actual exon but
boundaries incorrect)
Number of exons (of all types) for which the reading frame is (in)correctly
predicted
Results in terms of nucleotides:
Nucleotides (in)correctly predicted as exon
Correlation coefficient for exon prediction
Results in terms of splice sites:
Correlation coefficient for splice site predictions
Results in terms of assembled genes:
Number of amino acids in predicted protein (in)correctly predicted
I find that most papers on the subject will analyze their data in terms
of one or two of these catagories. Often, it is difficult to get a
feeling for rates of false positives and never is possible to calculate
the correlation coefficients mentioned above from the raw data. While
I appreciate one must work within space limitations but there are so many
programs of this type appearing in the literature, it is important to
be able to attempt some objective comparison.
So, since I too am working on this problem, I would like to compile
a wish-list of statistics that bionet readers would like to see.
What critical bit of information have you found lacking in the analysis
of GeneID, GeneModeler, GRAIL, NETGENE, SORFIND, or whatever program
you have been using. I will compile a list of such stats and post if
there is sufficient interest.
Thanks,
Eric E. Snyder
Department of MCD Biology ...making feet for childrens' shoes.
University of Colorado, Boulder
Boulder, Colorado 80309-0347