Science and Celera
pmp at UDel.Edu
Thu Dec 14 20:39:27 EST 2000
The e-mail for Donald Kennedy did not work for me. What did work
was kennedyd at stanford.edu.
On 14 Dec 2000, James McInerney wrote:
> Dear all,
> I have had this message passed to me. I hope the authors don't mind me
> passing it on to you.
> Dr. James O. McInerney,
> Department of Biology,
> National University of Ireland,
> Co. Kildare,
> +353 1 708 3860
> +353 1 708 3845
> Dear fellow bioinformatics developers:
> By now you have probably heard that Celera Genomics has submitted
> their human genome paper to the journal Science. Science and Celera
> have agreed to special terms for the release of the human genome
> sequence data. It will be made available through the Celera website,
> and will not be submitted to the international DNA database consortium
> (GenBank, EMBL and DDBJ). Science's statement regarding the agreement
> is at:
> All major journals, including Science, have a policy of deposition of
> sequence data with the "appropriate data bank". The accepted community
> standard is submission to GenBank/EMBL/DDBJ. The reason for this
> deposition is to make the results of the work openly available for
> future research. This principle was specifically mentioned in the
> Clinton/Blair statement on human genome sequencing -
> - - who strongly upheld the view that "unencumbered access" to genome
> data was critical.
> The terms of the Celera/Science agreement will give us access to the
> genome sequence, but not unencumbered access. Celera is suggesting
> publishing their data under a MTA (Material Transfer Agreement) which
> would prevent large scale downloads and incorporation of this data
> into GenBank/EMBL/DDBJ. In order to download the data, you and your
> institution will have to sign a contract guaranteeing that you will
> not "redistribute" the Celera data.
> Science believes that the deal is an adequate compromise because it
> provides us the right to download the data and publish our results.
> We believe Science is thinking in terms of single gene biology, not
> large scale bioinformatics. It is probably not hard for you to imagine
> scenarios in bioinformatics in which "publication" and
> "redistribution" are virtually the same thing; we cannot imagine
> Celera allowing us to incorporate data into Pfam, for example,
> nor into Ensembl.
> We are asking for your support in writing to Science to politely
> insist that genome sequence papers should be accompanied by
> unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have
> no issue with Celera either keeping this data unpublished for
> commercial reasons, nor with them combining their data with freely
> available data from the public genome projects. We would defend their
> right to do either. Our view is simply that the genome community has
> established a clear principle that published genome data must be
> deposited in the international databases, that bioinformatics is
> fueled by this principle, and that Science therefore threatens to set
> a precedent that undermines our research.
> We encourage you to express your views on this matter to Donald
> Kennedy (kennedyd at kennedyd.pobox.stanford.edu), the Editor-in-Chief of
> Science, and/or to Barbara Jasny (bjasny at aaas.org), the managing
> editor in charge of genomics papers at Science.
> Here is a Q/A about some points.
> * Why does this matter?
> A classic example of how our field began to have an impact on
> molecular biology was Russ Doolittle's discovery of a significant
> sequence similarity between a viral oncogene and a cellular growth
> factor receptor. Russ could not have found that result if he did not
> have an aggregate database of previously published sequences. We have
> come a long way from Russ and his son typing data into the NEWAT
> protein sequence database by hand.
> Throughout the 80's the international database community fought hard
> to insist that DNA sequence data be deposited into the public domain
> databases. Journals now generally require deposition as a condition of
> accepting a paper. The forming of these databases and the
> international agreements on data sharing between the European,
> American and Japanase databases fostered the rapid development of
> bioinformatics research. We now all take for granted the fact that
> large DNA databases are accessible from a single point of contact, and
> the identifiers are coordinated worldwide.
> Bioinformatics research relies on open data with minimal legal
> encumberances submitted to public databases. Without these databases
> there is no real substrate for bioinformatics research.
> * What would happen if this precedent was set?
> There are a number of consequences if Science set a precedent that
> allowed people to publish DNA data under a variety of MTAs.
> - - One would not be able to form a single DNA database on which to
> do bioinformatics research, and the derivative databases (Swissprot,
> PIR, Pfam, PROSITE, etc.) would not be legal.
> - - Bench biologists would have to visit a number of websites and
> possibly enter into a number of different contracts for access to DNA
> data. Unexpected informative homologies could become prohibitively
> difficult to find.
> - - You may need to get a legal review before you can publish
> the results of an analysis, if your analysis is large-scale and
> detailed enough that it could be reasonably interpreted as a
> "redistribution" of the primary sequence data. You could
> be sued for breach of contract for a Web Supplement page
> that discloses extensive sequence data supporting your results.
> - - Scientific openness will be undermined. Efforts to engage the
> community in cooperative annotation of large genomes, for instance,
> would be blocked -- we can't usefully annotate a genome we can't
> * Celera paid for it. Can't they set their own access terms?
> Absolutely. We have no issue with Celera's commercial data gathering,
> and their right to set their own access terms to their data. We do
> feel, though, that scientific publications carry a certain ethical
> responsibility. The purpose of a paper is to enable the community to
> efficiently build on your work. There is always a tension between
> disclosing your work to your competitors (this is not unique to
> private companies!) and receiving scientific credit for your work via
> publication. This tension is natural, and maintaining a consistent
> and acceptable balance is the reason that scientist and journals
> establish community standards that dictate how data are required to be
> disclosed. In this case, the clearly accepted community standard is
> that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon
> We certainly do not blame Celera (much) for seeking a special deal
> that lets them have their cake and eat it too -- they would
> understandably like scientific credit for their terrific and important
> work in human sequencing, and they would also like a profitable
> business model.
> We do blame Science for failing to take a strong stand in upholding
> accepted scientific publication practices. We cannot accept that it is
> necessary to sacrifice ethics for expediency.
> * Science claims they are honouring their own policy. What gives?
> Science now claims that all their policy really requires is that
> archival data be available via a publicly accessible database. We
> think this is a conveniently revisionist view of their own policy,
> which states (in Instructions to Authors):
> "archival data sets (such as sequence and structural data) must be
> deposited with the appropriate data bank and the identifier code should
> sent to Science for inclusion in the published manuscript (coordinates
> must be released at the time of publication)"
> Notice the use of the definitive article "THE appropiate data bank",
> the notion of "deposition", and the additional rider that the
> identifier code should be sent.
> The spirit of this statement seems clear to us. Science's statement
> anticipates that there is an appropriate, single, aggregrate community
> database for each sort of archival data, whether DNA sequence, protein
> structure coordinates, or something else. Sensibly, they don't name
> every possible database for every possible archival data set. They
> expect that recognized community standards exist. In no way does
> Science's statement seem consistent with the view that an individual
> lab could start its own "public" DNA sequence database and send a
> meaningless internal database identifier; to try to read it that way
> is a post hoc rationalisation.
> * What can Science do? This is a done deal.
> It's true that this is a done deal. Science and Celera have mutually
> agreed to the general terms of data release. But there are two ways
> that we can minimize the damage.
> First, the details of the agreement are not set. In particular, there
> is no definition of allowed "publication" versus prohibited
> "redistribution". Science could specify definitions that did not
> interfere with noncommercial uses of the data in bioinformatics,
> allowing us redistribution rights if it made sense in the context of
> our project (for example, a genome annotation project like Ensembl).
> Second, and preferably, Science -- or even the peer reviewers -- can
> uphold Science's own data access policy, and reject the paper.
> Incidentally, they might also choose to enforce Science's policy on
> prior publication, which states "...the main findings of a paper
> should not have been reported in the mass media. Authors are, however,
> permitted to present their data at open meetings but should not
> overtly seek media attention." If I issued a press release upon
> submission of a manuscript to Science, like Celera did, Science would
> rightly fire it back to me without review.
> * What can I do?
> Agitate. Let Science know that you care. They consider this deal to be
> a trial balloon for future genome papers. Even if we can't change the
> deal with Celera, we can try to make sure it's a one-time-only deal
> that's viewed as a Big Mistake. Write a letter to Science and tell
> them how their actions would impact your research, both in the long
> term and in the short term. Also, you can pass on this open letter to
> other bioinformatics researchers you know.
> Dr Sean Eddy,
> Alvin Goldfarb Professor of Computational Biology,
> Howard Hughes Medical Institute, Washington University in St. Louis,
> Dr Ewan Birney
> Team Leader, Genomic Annotation
> European Bioinformatics Institute, UK
More information about the Mol-evol