GenBank Errors

Sanjay Kumar at Cold Spring Harbor Lab kumar at CSHL.ORG
Mon Oct 21 15:16:50 EST 1991

owen white (owhite at wrote:

> database.  But I was told that they were not planning on implimenting
> any software to detect these errors.  Code to find these errors would
> be about 200 lines.

Has this code been written? If so, it would be valuable to release it to
the public so it might be evaluated and improved upon.   As Paul 
(pgil at pointed out:

> However, with the advent of automated submission tools that are
> now a reality we can begin to capitalise on this by providing
> that same level of data integrity checking AT THE LAB BENCH.

This seems like a very rational direction in which to proceed.  If software
was available for identifying possible errors in data *PRIOR* to submission,
the database curators would be helped.  Data that was flagged as containing
potential errors could and should be rechecked by the scientist.  This is
no different than with any other data that goes into a paper.  Such a data
validation program should be able to flag possible sequencing errors, 
inclusion of vector sequence, as well as inconsistencies relating to merging
entries.  A server-based program would be valuable.  

Tom Schneider (toms at wrote:

> I've been suggesting solutions, such as named objects and merged entries, for
> 10 years. 

How about *detailing* those suggestions here so everyone could evaluate the 
proposed solutions?  

Sanjay Kumar
Cold Spring Harbor Lab
kumar at

More information about the Bioforum mailing list