GenBank Errors

Keith Robison robison at chromo.harvard.edu
Tue Oct 22 13:29:03 EST 1991


Re: GenBank Errors

I have recently been involved in a project making heavy use of GenBank.
We have found many errors in the annotations of sequences.
What I find most appalling is the number of spelling errors in keywords,
as this could be corrected with a standard spell-check program.
(Regarding the nature of this comment, I hope I spelled appalling correctly!)
Among the other classes of errors I have detected:

  Feature listings off by one base                     
  Features listed at a transpose of the correct location (987 instead of 978)
  Nuclear genes for organellar proteins listed in the organelle category
  Incorrect feature types (i.e. using CDS for each individual coding 
  exon, rather than a single CDS with joins to specifiy the coding sequence)

  Incomplete features missing > or <
  Non-existent features
  Missing feature line to specify using non-standard genetic code

Perhaps the most bizarre class of error I have witnessed is stuttering, where
a feature line is repeated many times without any significance to this
(example: an exon specification repeated 6 times).

This is clearly the mark of sloppy workmanship.



Keith Robison
Harvard University                                                
Program in Biochemistry, Cellular, Molecular, and Developmental Biology 



More information about the Bioforum mailing list