GenBank Errors
Bruce Roe
BROE at AARDVARK.UCS.UOKNOR.EDU
Sun Oct 20 18:38:00 EST 1991
Regarding the recent posting by lamoran at gpu.utcs.utoronto.ca
Bravo, well said.............
=>
=> There are several examples of errors that are probably GenBank's fault and
=> in my experience these are quickly corrected when GenBank is alerted. The
=> problems are with those errors that are NOT GenBank's fault. It is not
=> obvious if, and how, such errors should be handled.
=>
=> My own feeling is that it would be desirable for experts to cull the database
=> and make intelligent decisions about redundancy and errors. No data should
=> be ommitted but it could be relegated to annotation. I doubt that there
=> will be many "volunteers" to do this job.
The problem is *who* and *how* these "intelligent decisions" to determine
*which* sequence is *really* correct are made and it depends on what we all
perceive the database to be. If I publish an article in a journal and later
find out I made a mistake in an interpretation, should all the librians world
wide rip out those pages from the journal and put a note in place saying sorry
but Bruce was wrong? I'd rather see all our sins in the database unless the
original author agrees to the change their original data interpretation.
=> Incidently, I believe that most
=> sequences are no more than 99.4% accurate (ie. 6 errors per 1000 nucleotides)
=> so we shouldn't get too upset about errors in the database.
=>
Actually it is my experience that the error rate is between 98 and 99%
accurate. 1-2 errors/100. Don't forget a lot of the data in the data bases
was collected before thermal-stable polymerases, 7-deaza-dGTP and other
methods for removing compression and guessing.
Cheers...........bruce roe
More information about the Bioforum
mailing list