GenBank errors

deustachio at mcclb0.med.nyu.edu deustachio at mcclb0.med.nyu.edu
Sat Oct 19 10:03:51 EST 1991


     An unfortunate aspect of the Schneider and White postings is that a
real scientific issue is getting lost in the flames.  As best I can gather,
what ignited Schneider was the observation that several different groups
had submitted sequences of what he believes to be the identical gene, and
all were entered into the database with no, or inadequate, annotations to
alert an unwary user who stumbles on one of them to the possible
redundancy. 

     The scientific issue is how to decide, given only the two nearly
identical sequences, obtained by two different groups under two sets of
circumstances, whether they are 1) in reality identical but contaminated by
sequencing errors (see, e.g., the discussion of TC4 sequences in Matsumoto
& Beach Cell 66: 347 (1991) for a bit of autobiography in this regard); or
2) alternate allelic forms of a single gene; or 3) different members of a
multigene family.  This decision becomes even more complicated when the
genetic source of the DNA that was analyzed is unclear ("lab mouse cDNA",
or "human placental genomic DNA"), and more complicated still when the
comparison is being made between species. 

     How should these analyses be reflected in a sequence database?  It 
seems likely to me, from my experience with analogous databases of mouse 
genetic linkage data (where the issue is whether two phenotypes defined in 
two different assays and shown to map to the same or nearly the same 
genetic point define a single gene with several properties or two genes 
that are very closely linked), that all of the important cases are likely 
to be controversial.  It's unlikely that a single universally acceptable 
annotation could be constructed, and really unlikely that a librarian
(human or automatic), as opposed to an expert in the scientific area, could 
construct such an annotation.

    This would seem to me to be an area in which GenBank might ultimately 
have to reach out still more than it already has to its user community, and
provide some way for experts with axes to grind to annotate not only their
own sequences, but also to provide commentary annotations to other
sequences as well.  Of course, a path already open is to update the 
annotations to one's own sequences to include such interpretations and
cross-references. 

Peter D'Eustachio / NYU Medical Center - Biochemistry / New York,NY
deustachio at mcclb0.med.nyu.edu



More information about the Bioforum mailing list