An Arabidopsis Genome Database

cherry at FRODO.MGH.HARVARD.EDU cherry at FRODO.MGH.HARVARD.EDU
Thu Sep 12 13:13:49 EST 1991


To:	Members of the Arabidopsis research community

From:
       Sam Cartinhour    cartinhour at frodo.mgh.harvard.edu
       Mike Cherry       cherry at frodo.mgh.harvard.edu
       Brian Hauge       hauge at frodo.mgh.harvard.edu
       Howard Goodman

       Department of Molecular Biology
       50 Blossom Street
       Massachusetts General Hospital
       Boston MA  02114

Re:	An Arabidopsis Genome Database


The rapid accumulation of molecular and genetic data has created an urgent
need for a database tailored to the Arabidopsis community. Recently, we
have begun the task of collecting information and building such a database.
This project is supported by a grant from the U.S. Department of
Agriculture through the National Agricultural Library. The database uses a
mouse-based interface with graphical windows and will be available early
next year or sooner if things go well.

The purpose of this message is twofold: first, we want to briefly describe
the database, and second, we want to invite interested parties to
participate by submitting information, providing feedback, making
suggestions, and ultimately running the database on computers in their own
laboratories. At the end of this message is a short questionnaire we would
like you to fill out. This questionnaire will help us tailor the database
to the needs of the greater Arabidopsis community and will also allow us to
establish a mailing list for individuals actively interested in helping
shape this important resource.


** The Arabidopsis database is based on a new C. elegans Database:

Recently, Richard Durbin (MRC, UK) and Jean Thierry-Mieg (CNRS-CRBM,
France) developed and released a database containing C. elegans genome
information. ACEDB ("a C. elegans database") contains genetic, cosmid, and
YAC maps, lists of strains, phenotype characteristics, bibliographic
references, duplications and deficiencies, information on 2-point and
3-point crosses, laboratories, and available cloned DNAs, as well as DNA
and Protein sequence data with feature tables. Please note that this is not
a complete list of all types of information contained with the database.

Fortunately for all of us, ACEDB is a very versatile database and can be
adapted easily to display Arabidopsis information. The initial phase of
this task is almost complete and we are now loading all the data we have at
hand. We call this database AATDB (pronouned at-d-b), "an Arabidopsis
thaliana database."


** AATDB -- main features:

AATDB provides a graphical presentation of the genome information and is
navigated via menus and a mouse. Chains of information are easily followed
by clicking on a displayed object (for example, a gene name); the related
information (for example, position on the physical map) is presented in a
new window. Extensive links between different classes of information make
it possible to locate interesting data from many different starting points.
The database supports both "casual" searches, where users "click around"
from object to object, and "formal" searches, where users query the
database using a special language. Text and graphical information can be
retrieved in printed form on a PostScript printer.


** When will AATDB be available?

We hope to have the first distribution version of AATDB available in the
early next year. Subsequent versions will follow at regular intervals. An
announcement to the Arabidopsis Genome mailing list will precede the
initial release.


** Who will be able to use AATDB?

AATDB will be available to any researcher who wants it. You will need a
computer and a minimum of 50 megabytes of disk storage space just for the
database. Initially AATDB will be available for Sun Microsystems SPARC
workstations. However, the software is designed to compile under any
X-windows system. The developers of ACEDB are currently working on a
Macintosh version; once the Macintosh version of ACEDB is released, the
AATDB Macintosh version will soon follow. This Macintosh version will most
likely require a color monitor, large hard disk or CD-ROM and a Macintosh
II.


** What will be in AATDB?

We would like to make AATDB as useful as possible by including all
information that would be useful to the Arabidopsis community. So far we
plan to include: the physical map with cosmid and YAC contigs, cloned
genes, and rflps; bibliographic references for all Arabidopsis papers;
lists of stocks and laboratories (with contact information); the genetic
map which integrates classical, rflp, and physical data; sequences with
associated features (obtained from searches of publically available
databases); mutations and their phenotypes.

Much of the information in AATDB (for example, the data on 20,000 cosmid
clones produced by Brian Hauge and Howard Goodman) will be as yet
unpublished. We are hoping to encourage labs to release such data, no
matter how "unimportant," to be included in AATDB. Data that seems
uninteresting or unimportant to you may nonetheless be useful in the
context of the greater collection of information found in AATDB. The value
of the database ultimately depends on your generosity.

It is important to emphasize that the success of this database requires
community effort. Easy access to a comprehensive Arabidopsis database will
benefit all of us in our day to day experimentation. Clearly, the more
published and unpublished data that is available in the database, the more
powerful the database becomes. Our intention is that the database will be
refined as an increasing number of people utilize it and submit new data or
point out new relationships of information already in the database. If we
want this to work it is essential that we submit the unpublished data
sitting in our old notebooks. The DNA sequences of those false positives
that we pulled out of a screen may save you months of work. Similarily, you
may find that a partial sequence is already available for that gene you've
been thinking of cloning. We all agree that cloning, mapping etc., by phone
or e-mail is both faster and more cost-efficient than doing it ourselves.
The database will avoid needless duplication of effort and we will all
benefit.


** AATDB data submission, updates, and distribution:

We intend to make AATDB available as a central repository for Arabidopsis
researchers. Information will be obtained from two sources: routine scans
of public databases, and submissions from individual laboratories. A
mechanism for data submission is being developed to make it as easy as
possible to submit information to the database curator(s). The curators
will maintain the integrity of the AATDB. They will review the information
and when appropriate update the database or request more information from
the submitting individual. Note that in the case of unpublished sequence
data, we would prefer that you first submit it to Genbank or EMBL so that
it will be assigned an accession number (both databases accept sequences
even when no plans to publish it exist).

The software, updates to it, and data will be available by anonymous FTP on
the Internet. An electronic mail sever is also planned. Further we
anticipate that both the SPARC and Macintosh versions of AATDB will be
available on CD-ROM for a minimal fee.

** A Short questionnaire:

Please help us match the needs of the Arabidopsis community to the features
of AATDB by answering a few questions. Also if you are interested in this
project and would like to receive future announcements please specify this
in the form below.

Fill out the following form and mail it back to the AATDB curator at the
Internet address: curator at weeds.mgh.harvard.edu

Please do not answer this survey by replying to the mailing list or Usenet
group. Please send your answers to the curator address.

Thank You!!

.........................................................................

To: curator at weeds.mgh.harvard.edu




More information about the Arab-gen mailing list