more on GenBank errors
owhite at nmsu.edu
Wed Oct 23 23:31:36 EST 1991
In article <9110231652.AA14074 at primate.cshl.org> marr at CSHL.ORG (Thomas G. Marr) writes:
> I guess I should respond to some of the criticisms leveled against my
> original response to Tom Schneider's remarks about errors in the
> GenBank database...
> I think what disturbed me most about his remarks was the fact that
> blanket assertions were made with no supporting data or statistics.
> For example, if he has encountered errors in the database, then we
> should have the details:
> 1. How many errors have been encountered in the database? How many
> entries have been examined? If there are, say, 30,000 entries in
> the database and Tom S. has examined 1,000 entries and has found 30
> errors then this tells us something.
During some experimental work I have done, I discovered errors in the
CDS portions of GenBank features files. These mistakes were incorrect
designations of exon-intron borders that were not in the original
journal article. I suspect that these errors were either introduced
when the authors of these articles electronically submitted these
entries, or when or when they were manually typed at GenBank. I have
notified the genbank.updates about the plant genes. The locus names
are provided, with the number of errors in parenthesis.
of 279 genes examined,
31 mistakes (11%) were found
RICRAC2(4) RICRAC3(2) RICRAC7(4) MZEOPA2(2)
BLYGLUEND(1) MZEOPA2(2) CIPPPCA(1) CIPPPCB(2)
PETRBCS08(2) TOMCAB8(2) TOMTRYINHI(4) TRTHB(3)
of 202 genes examined,
6 mistakes (2.9%) were found
MUSIGULVJ(1) MUSPSPC(2) RATTRPM2B (3)
> 2. What is the nature of the each error? Is the error most likely
> attributable to mistakes that the GenBank (or EMBL, DDBJ) staff has
> made? Or is the error attributable to the original author? Is the
> error attributable to software written by a secondary distributor of
> GenBank? The point here is that there are many independent sources of
> error and to be able to proceed with a plan to fix errors, this
> detailed type of information is required. It does nothing to make wide
> assertions which are prejudicial and not based upon the scientific
> method. I for one have little room for prejudice whether it's in a
> social context or a scientific context.
I am sorry this post doesn't answer to the above questions in a more
rigorous way. Certainly, some accessions were labeled "automatic".
The point I am in agreement with, is that there are many possible
sources for mistakes. This is am emotional issue, and interestingly,
all parties concerned (at least appear to) want to see GenBank work.
I am of the opinion that casting blame is not as much the issue, as
what can be done to correct known errors in the database.
owen white (owhite at nmsu.edu)
there is no god, there is only noise
there is no noise, there is only god
the difference between art and science is that in art, if something
works, it doesn't have to make sense.
More information about the Bioforum