Arabidopsis Database 2 of 4 parts

Chris.Somerville 21847CRS at MSU.EDU
Sun Aug 8 23:02:00 EST 1993

Report of the Ad Hoc Committee on Arabidopsis Database Requirements

Executive Summary
      An ad hoc committee representing Arabidopsis researchers and
government agencies met in Dallas on 5-6 June 1993 to discuss the
informatics needs of the Arabidopsis research community.  The
committee discussed what role the Arabidopsis database (ADB) could
serve in the overall Multinational Arabidopsis thaliana Genome
Research Project.  Biological and technical features that should be
considered in proposals for database development were outlined.
The committee recognized that one or more linked Arabidopsis
databases will play a key role in the future of the Arabidopsis
genome project.  ADB could also provide an important model for
database development in other areas of biology.

The committee recommended that:
      - Arabidopsis database(s) be established and maintained for
use by the Arabidopsis research community.  Funding of such
database services was considered an appropriate use of federal
      - The database(s) have both service and research components,
with the service component modeled on a scientific journal,
including a chief editor, associate editors, and reviewers.
      - The database(s) be linked to the needs of the community
through an oversight committee.
      - The database(s) be generally accessible both nationally and
internationally through electronic networks, with all data
available on request.
      - The design of the database should allow portability to new
generations of database software and to new generations of
      Specific recommendations were also made on the content of the
database, the linkages between different types of data, and the
design features that would allow the database to be of maximal use.
These are described in detail in the sections that follow.


      Computer databases have become an essential resource in
disciplines of biology such as genetics, ecology and taxonomy,
where large amounts of archival-quality data have accumulated.  The
rapid growth of information on the structure and function of the
Arabidopsis genome has resulted in a widely perceived need for the
development of one or more databases which will permit facile
access to this information.  This sense of need was prompted by the
accumulation of large amounts of data concerning physical and
genetic mapping of the Arabidopsis genome and large- scale
sequencing efforts associated with the Multinational Arabidopsis
Genome Project as described in the most recent Annual report (NSF
Publication 92-112).  Several groups within the Arabidopsis
community have responded by adapting preexisting databases, such as
ACEDB, to meet some of the most urgent requirements.  Other groups
have initiated the development of new databases.  These initiatives
have greatly increased the awareness of database utility within the
Arabidopsis community and have raised many questions concerning the
design and maintenance of public databases for Arabidopsis and
other plants.
      In order to assess the long-term database needs of the
Arabidopsis community, an NSF-sponsored workshop on this topic was
convened in Dallas on 5 June 1993. The workshop participants
included the elected members of the North American Arabidopsis
Steering Committee.  Representatives from the National Science
Foundation, the U.S. Department of Agriculture, and the European
Community were present as observers.  Two scientists involved with
the human Genome Data Base (GDB) were also present as technical
advisors.  A list of participants is given at the end of this
document.  A number of written suggestions received from the
Arabidopsis community in response to announcements on the
Arabidopsis electronic newsgroup were discussed during the course
of the workshop.
      The general goals of the meeting were to examine the present
and future needs for a database and to outline in general terms the
main issues which should be addressed in any future proposals
concerning the development of new or expanded Arabidopsis
databases.  The discussions were intentionally focused on
biological and community issues and there was no attempt to define
or specify issues which are primarily related to specific computer
hardware or specific database programs.  In this and related
respects, the following report is intended to be only a guide to
assist in the development and review of grant proposals directed
towards the development of expanded database resources for the
Arabidopsis community.  It is hoped that the report may also be of
utility to colleagues contemplating the development of databases
for other organisms.
      A central issue of the workshop concerned the question of
what purpose an Arabidopsis database should serve.  This was
approached by listing all of the different types of data that can
be obtained for Arabidopsis or other plants, and then considering
whether it would be essential, advantageous, or of little utility
to have such data available in a highly interrelated, easily
searchable electronic form.  The outcome of this far-reaching
discussion was the recognition that, if properly conceived and
constructed, an encyclopedic database interrelating everything from
nucleotide sequences to ecological data could provide a completely
novel research tool that would potentiate new kinds of discoveries
in biology.
      Because of the relative ease of data access, such a database
might eventually displace scientific journals as a source of
certain kinds of primary scientific information.  This
scientifically exciting possibility lies at the heart of the
following report.  However, in view of the large costs likely to be
incurred in developing and maintaining an encyclopedic database,
and in view of the technical challenges such a database presents at
the present time, the necessity of assigning priorities to certain
aspects of database development was recognized.

      There were six main conclusions from the workshop:

      First, although all Arabidopsis information need not be
archived in a single database, it is essential that all Arabidopsis
databases be able to communicate with each other and with other
major databases, such as the nucleic acid databases, in a seamless
networked fashion.  In this respect, the word "database" may refer
to a collection of separate and specialized databases which are
developed and maintained by different groups, but which operate as
one large federated information resource with a common controlled
vocabulary, user interface and editorial practices.  In this and
related respects it was considered essential that Arabidopsis
databases be available via international electronic networks.
      Second, the maintenance of databases involves a major service
component. Because the community will increasingly rely upon such
a database to provide access to much crucial information, it is
imperative that Arabidopsis databases be implemented and managed in
such a way that responsiveness to present and future community
needs will be ensured.
      Third, the work will involve an essential research component.
Shorter term research will be required to determine the best way to
implement many of the current needs for such a database, and longer
term research will be required to ensure that the database can
migrate over time to different and improved hardware and software
platforms.  The database must have the ability to grow with the
projected increase in information which would result from the
complete sequencing of the Arabidopsis genome and the analysis of
corresponding gene functions.
      Fourth, the databases should provide, in some real sense, an
intellectual focus for the interpretation of biological data.  In
particular, the committee noted that mechanisms must be devised to
provide for the synthesis, integration, and  editorial control of
information.  The concept of structuring a database along the lines
of a scientific journal is attractive in this respect (i.e., a
database should have editors, reviewers, and production staff).  It
is also essential that the database be available for the
participation of the international scientific community.
      Fifth, all of the data and knowledge residing in such a
database will form an irreplaceable intellectual resource for the
scientific community.  Therefore, federal support for such a
database should be long-t

More information about the Arab-gen mailing list