GenBank Errors
Keith Robison
robison at chromo.harvard.edu
Tue Oct 22 13:29:03 EST 1991
Re: GenBank Errors
I have recently been involved in a project making heavy use of GenBank.
We have found many errors in the annotations of sequences.
What I find most appalling is the number of spelling errors in keywords,
as this could be corrected with a standard spell-check program.
(Regarding the nature of this comment, I hope I spelled appalling correctly!)
Among the other classes of errors I have detected:
Feature listings off by one base
Features listed at a transpose of the correct location (987 instead of 978)
Nuclear genes for organellar proteins listed in the organelle category
Incorrect feature types (i.e. using CDS for each individual coding
exon, rather than a single CDS with joins to specifiy the coding sequence)
Incomplete features missing > or <
Non-existent features
Missing feature line to specify using non-standard genetic code
Perhaps the most bizarre class of error I have witnessed is stuttering, where
a feature line is repeated many times without any significance to this
(example: an exon specification repeated 6 times).
This is clearly the mark of sloppy workmanship.
Keith Robison
Harvard University
Program in Biochemistry, Cellular, Molecular, and Developmental Biology
More information about the Bioforum
mailing list