GenBank Errors

Bruce Roe BROE at AARDVARK.UCS.UOKNOR.EDU
Sun Oct 20 18:38:00 EST 1991


Regarding the recent posting by lamoran at gpu.utcs.utoronto.ca

Bravo, well said.............

=> 
=> There are several examples of errors that are probably GenBank's fault and
=> in my experience these are quickly corrected when GenBank is alerted. The
=> problems are with those errors that are NOT GenBank's fault. It is not
=> obvious if, and how, such errors should be handled.
=> 
=> My own feeling is that it would be desirable for experts to cull the database
=> and make intelligent decisions about redundancy and errors. No data should
=> be ommitted but it could be relegated to annotation. I doubt that there
=> will be many "volunteers" to do this job. 

The problem is *who* and *how* these "intelligent decisions" to determine
*which* sequence is *really* correct are made and it depends on what we all
perceive the database to be.  If I publish an article in a journal and later
find out I made a mistake in an interpretation, should all the librians world
wide rip out those pages from the journal and put a note in place saying sorry
but Bruce was wrong?  I'd rather see all our sins in the database unless the
original author agrees to the change their original data interpretation.

=> Incidently, I believe that most 
=> sequences are no more than 99.4% accurate (ie. 6 errors per 1000 nucleotides)
=> so we shouldn't get too upset about errors in the database.
=> 

Actually it is my experience that the error rate is between 98 and 99%
accurate.  1-2 errors/100.  Don't forget a lot of the data in the data bases
was collected before thermal-stable polymerases, 7-deaza-dGTP and other
methods for removing compression and guessing.

Cheers...........bruce roe



More information about the Bioforum mailing list