Sequence submission discussion
BROE at AARDVARK.UCS.UOKNOR.EDU
Tue Dec 5 09:10:00 EST 1989
In regard to the recent discussions regarding Genome Data Submission
and NIH Policy, I also was at the Wolf Trap meeting and was bothered
by two points.
1. The submission issue and hoarding of data.
2. A discussion of how many errors should be tolerated in the sequence
data in the human genome project.
This message addresses my concerns about the first issue and
another message to follow addresses the second issue.
Regarding the first item, those of us who sequence a reasonable
amount are well aware of the timely submission of data to GenBank or
EMBL and most (although I have no statistics) submit the data in an
unannotated form via disk and snail_mail. With the new GenBank-OnLine
now replacing many aspects of BIONET, data submission is just an e-mail
message away. Even without GenBank-OnLine, it has been possible for
quite a time now to e-mail sequence data directly to GenBank (and EMBL)
via their network addresses.
For GenBank the address was and I'm fairly sure still is:
gb-sub%life at lanl.gov
For EMBL submissions info can be obtained from the EMBL server:
at DATALIB at EMBL. Those of us with the UWGCG program package can obtain
a copy of the sequence submission form on-line from our host VAX by
fetching the file GENBANK.FORM and then filling in the blanks using
one of the VAX editors. The problem of the actual submission of final,
published sequence data hopefully will go away once the community becomes
more computer literate and discovers e-mail. I know I'm preaching to the
As you and I (and others) have discussed many times, there is
a large frustration among the sequencing community regarding availability
to access the sequence data once it has been submitted to one of the data
bases. Both GenBank and EMBL are moving in the direction of eventual daily
updates to their databases and in the very near term weekly updates should
be a reality. It is very frustrating to see a sequence published in a
journal that requires GenBank/EMBL submission and not be able to obtain
the data from an on-line source. Both databases are addressing this issue
and I have indications that this should not be a problem in the near future.
As you indicated in an earlier posting, the GenBank database now lags only
a month or so behind the journals in entering published data into the database.
With direct access to this data on-line or via ftp, in a timely way, IG
is providing the kind of service we all can really use. We still have
the questions of why the data was not submitted directly to the database
by the group which published the sequence and thus why those at GenBank
must enter the data manually? I know you have tried to get everyone to
submit their final data to GenBank when they submit their manuscript and
wish all journals would make simultaneous submission to the database a
prerequisite for publication. I just do not understand why folks do not
As for hoarding of data, this too shall pass. We are judged by,
among other things, our publication record when it comes time for our
grant reviews. In the world of "Publish or Perish" the data is not data
until it has been published. If the sequences we complete are not published
then Study Sections are less likely to rank a grant renewal as high as
they would if publications result from the work. This is a self limiting
process and those known for hoarding data and not publishing it should and
will obtain their just reward.
Bruce A. Roe
Professor of Chemistry and Biochemistry
INTERNET: BROE at aardvark.ucs.uoknor.edu
SnailNet: Department of Chemistry and Biochemistry
University of Oklahoma
620 Parrington Oval, Rm 208
Norman, Oklahoma 73017
More information about the Bioforum