In article <RSMITH.94Aug31204529 at dot.bcm.tmc.edu> rsmith at dot.bcm.tmc.edu
(Randall Smith) writes:
| : Announcing a new WWW Server: The BCM Sequence Annotation Server
| : URL: http://dot.imgen.bcm.tmc.edu:9331/seq-annot/home.html
This idea has both good and bad points and I think that they should be
considered extremely carefully.
First, the idea that individual researchers should be able to update a database
directly is wonderful. If everybody were to do that to the data they are
experts on, GenBank would quickly become a clean database.
However, there are several problems:
1. Corruption of the data either intentionally or inadvertently. The authors
already are aware of this possibility.
2. Inconsistency between researchers. The NCBI attempts to make entries
uniform by certain standards, but individual researchers cannot know what these
are in detail. The only solutions I see are to force the kinds of annotations
to follow certain forms or to pass all the changes to experts for editing. The
trouble with passing the data to experts is that the number of experts has to
grow exponentially along with the growth of the database. Maybe we just have
to accept that to avoid a worse mess!
2. Do the data flow into GenBank? There is no indication that it will. This
means that if GenBank makes a correction it won't go into this database and
vice versa. Since the data are not maintained in one place, it is duplicated
and will eventually become inconsistent. Who should or will a researcher
believe? A randomly annotated database or the "official" one? How will
inconsistencies be resolved?
I see this as an experiment. Once the experiment has been shown to work
reasonably well, all the data should be passed to NCBI for careful processing
along with all new data. That is, the server should be run by NCBI or closely
in conjunction with them.
For my reasoning behind this posting, see the philosophy paper available from:
http://fconvx.ncifcrf.gov:2001/~toms/onlinepapers.html
Tom Schneider
National Cancer Institute
Laboratory of Mathematical Biology
Frederick, Maryland 21702-1201
toms at ncifcrf.gov