AAtDB farewell

Brian Fristensky frist
Sat Aug 5 14:20:21 EST 1995

john.morris at FRODO.MGH.HARVARD.EDU (John Morris) wrote:
>It is with regret that we are sending this announcement of the termination
>of the AAtDB curator service.  The release announcement which follows as a
>separate email message will be the last one from the AAtDB project located
>in the Goodman lab.
>There is clearly a need for a database service, as evidenced by discussions
>on this news group and the over 1000 daily "transactions" that take place
>on our gopher server.  Unfortunately, in these times of tight budgeting,
>neither USDA nor NSF appear interested in providing funds for an
>Arabidopsis database on a continuing basis.  We are thankful, however, for
>the support that the NAL and USDA has provided to us over the past several
>years and which has allowed us to establish AAtDB and hopefully thereby
>serve at least some of the database needs of the Arabidopsis community.

I am reminded of a book called ENERGY FUTURE, written some years ago by
two Harvard economists. They evaluated various sources or energy (coal, gas,
nuclear, solar etc.) and came up with relative costs expressed as barrels of
oil. When conservation was evaluated as an energy source, it turned out to be
cheaper than any other.

I think databases are to science what conservation is to energy. Yes, databases
don't generate new data per se. However, a tremendous amount of the data that
IS generated is effectively lost to the research community. The passage of time
is probably the main factor that makes scientific data gradually fade away. The
sheer volume of the literature means that a given piece of published data is
more likely to be missed. Along these lines, the explosion in the numbers of
journals, and their costs, is well along the way to destroying the usefulness
of institutional libraries. Today, the launch of a new journal doesn't bring
any more information to our library. On the contrary, it makes the percentage
of data that we will never see that much higher. 

Indicies like Current Contents on Disk are more of a band-aid than a solution.
The actual information that is indexed in these sources is limited, relative to
what appears in the paper itself. Furthermore, obtaining articles, either by
mail or interlibrary FAX, is slow.

The development of Internet-accessible databases is a way to increase the
efficiency with which data is used, to preserve its value over longer times,
and to make it useful to a larger portion of the research community.  In these
respects, databases could be said to greatly decrease the entropic loss of data
that occurs in a print-oriented distribution system.

I recently submitted a grant to Canada's Natural Sciences and Engineering
Research Council (NSERC) specifically intended for the creation of a database
for cruciferous plants, other than Arabidopsis. These would include canola,
radish, mustard, cabbage and so on. The economic importance of canola
(rapeseed) to Canada made it easy to get letters of support from numerous
commercial firms from the canola industry. Every crucifer researcher I talked
to was enthusiastic about the idea. Unfortunately, this grant was not even sent
out for review. The only reason given was that, although a postdoc position was
included in the grant, it still did not meet the Mission Statement requirement
for training highly specialized personel, since that postdoc would not be doing
any actual experiments. 

Funding databases in science makes the same sense as conserving energy. 
(If you don't think conservation makes good economic sense, just think 
about the fact that your gas and electric companies are always urging
consumers to conserve.) With databases, the data generated by each research
project gets more bang for the buck. Additionally, the ease with which future
researchers can access that data saves an awful lot of reinvention of the
wheel. This is nowhere more apparent than in areas such as map-based cloning or
marker- assisted selection, for which the cumulative knowledge of all prior
mapping projects can greatly speed up progress. 

Both the economic upheavals and technological advances of recent years should
be causing us to re-think the way we structure the process of scientific
I can not visualize a future in which databases to not play a central role in
every day science.

