more on GenBank errors
smouldering dog
owhite at nmsu.edu
Wed Oct 23 23:31:36 EST 1991
In article <9110231652.AA14074 at primate.cshl.org> marr at CSHL.ORG (Thomas G. Marr) writes:
> I guess I should respond to some of the criticisms leveled against my
> original response to Tom Schneider's remarks about errors in the
> GenBank database...
>
> I think what disturbed me most about his remarks was the fact that
> blanket assertions were made with no supporting data or statistics.
> For example, if he has encountered errors in the database, then we
> should have the details:
>
>
> 1. How many errors have been encountered in the database? How many
> entries have been examined? If there are, say, 30,000 entries in
> the database and Tom S. has examined 1,000 entries and has found 30
> errors then this tells us something.
During some experimental work I have done, I discovered errors in the
CDS portions of GenBank features files. These mistakes were incorrect
designations of exon-intron borders that were not in the original
journal article. I suspect that these errors were either introduced
when the authors of these articles electronically submitted these
entries, or when or when they were manually typed at GenBank. I have
notified the genbank.updates about the plant genes. The locus names
are provided, with the number of errors in parenthesis.
plants:
of 279 genes examined,
31 mistakes (11%) were found
RICRAC2(4) RICRAC3(2) RICRAC7(4) MZEOPA2(2)
BLYGLUEND(1) MZEOPA2(2) CIPPPCA(1) CIPPPCB(2)
PETRBCS08(2) TOMCAB8(2) TOMTRYINHI(4) TRTHB(3)
PEAPHY(1) PEALEGAG(1)
rodents:
of 202 genes examined,
6 mistakes (2.9%) were found
MUSIGULVJ(1) MUSPSPC(2) RATTRPM2B (3)
> 2. What is the nature of the each error? Is the error most likely
> attributable to mistakes that the GenBank (or EMBL, DDBJ) staff has
> made? Or is the error attributable to the original author? Is the
> error attributable to software written by a secondary distributor of
> GenBank? The point here is that there are many independent sources of
> error and to be able to proceed with a plan to fix errors, this
> detailed type of information is required. It does nothing to make wide
> assertions which are prejudicial and not based upon the scientific
> method. I for one have little room for prejudice whether it's in a
> social context or a scientific context.
I am sorry this post doesn't answer to the above questions in a more
rigorous way. Certainly, some accessions were labeled "automatic".
The point I am in agreement with, is that there are many possible
sources for mistakes. This is am emotional issue, and interestingly,
all parties concerned (at least appear to) want to see GenBank work.
I am of the opinion that casting blame is not as much the issue, as
what can be done to correct known errors in the database.
--
owen white (owhite at nmsu.edu)
-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-*-=-=-*-=-
there is no god, there is only noise
there is no noise, there is only god
-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-*-=-=-*-=-
the difference between art and science is that in art, if something
works, it doesn't have to make sense.
More information about the Bioforum
mailing list