Arabidopsis database survey results
Randall L Scholl
rscholl at magnus.acs.ohio-state.edu
Tue Jan 28 18:56:44 EST 1992
Sakti Pramanik, Michigan State University
Randy Scholl, Ohio State University
Keith Davis, Ohio State University
Thomas Cook, Michigan State University
ANALYSIS OF AIMS DATABASE SURVEY RESULT
Following is the result of the survey questionnaire to identify
the information management and information processing need for the
Arabidopsis Thaliana research community. This information will
serve as a basis for developing the requirements specification
for the Arabidopsis Information Management System (AIMS). The
project to develop AIMS is supported by the National Science
Foundation. Section II of this document gives a summary of the
survey results. The detailed analysis of the survey is given in
section III. You may email your comments to aims at genesys.msu.edu.
II. GENERAL DISCUSSION
The categories of information addressed by the questionnaire were
varied and covered almost every aspect of Arabidopsis genetics.
Nevertheless, essentially all types of information addressed were deemed
to be desirable for inclusion in the database by those completing the
questionnaire. A scale of 1 to 5, with 1 representing the highest
desirability, was employed for all questions. Means and variances were
calculated on this scale across the surveys returned for each question.
A majority of the mean responses fell in the range 1.2 - 1.8. The means
of a few classes ranged from 2.0 - 2.8. Variances were not extremely
large - a little greater than 1 (note that when the means are so close
to one extreme of the range of possible values, the variance must be
relatively small). Based on variances of this magnitude, a difference
between means of approximately 0.4 can be considered statistically
On the above basis, some categories were deemed more important by
the respondents than others. This provides a basis for considering which
types might be excluded or receive less prominence.
Approximately 175 surveys were distributed to the community. Of
these 55 were returned. Obviously, the completed surveys could represent
a biased subset of the community at large - it might also be argued that
the respondents represent those with strong interest and knowledge of the
subject, and thus individuals who are more likely to have positive
attitudes toward an electronic database. However, the current response
rate is very high by most standards for voluntary surveys. Thus it has to
be concluded that the overall support for and interest in such an endeavor
is very high.
A. Seed Stock Section
No items in this category elicited a negative (>3.0) response . The
majority of means fell within the range , 1.2-1.8. This is in keeping
with a philosophy that seed stock/genetic data represent a central focus
and that as much information as possible should be conveyed about stocks
and genes of Arabidopsis. The inclusion of estensive data dealing with
morphological characteristics of ecotypes, was not judged to have highest
priority: Since such data are readily available, it is suggested that this
data be included in the database as a special section.
Stock numbers coded for class of stock were not deemed important
by the respondents. Since such systems can lead to logistical problems,
anyway, it is suggested that coded stock numbers not be used.
The representation of pedigree data such as degree of backcrossing
was deemed important. It was suggested by one respondent that stocks
that have not been subjected to such techniques not be accepted into the
collection. Since a stock's pedigree represents extremely important
information, we will try to develop a means of efficiently collecting and
representing complete pedigree information in the database. We are
exploring some ideas in this regard, and suggestions on this subject are
B. DNA collection
The overall consensus for need of information of all types was
extremely high for the DNA stocks - the overall mean was approximately
1.4. One feature deemed to be of clearly lower priority was banding
patterns for clones. Information for contig clones regarding overlap, and
methods used to determine it, were given relatively low priority -
inclusion of other information about contigs however was given high
It is clear that the respondents consider the inclusion of detailed
information concerning clones to be one of the highest priorities of this
C. Linkage and Genetic Maps
This information was considered highly desirable, in general. The
ability to manipulate the linkage data locally was judged of lower
priority. The method of analyzing linkage data and constructing maps from
linkage, the types of consensus maps and whether analytical tools are to be
included in this database represent one of the important decisions to be
confronted. Iniatially, it may be best to include updated genetic,
physical and combined maps for reference and raw segregation data: This
would allow individuals to download as well as analyze, locally, data
relevant to their own experiments. Prompt submission of data to a centralized
mapping location would allow its incorporation into the AIMS data and map so
that the published map is current and accurate. Further input from the
community on this matter is welcome.
D. Sequence Data
A strong need for DNA sequence data was indicated by the respondents. There is
less need for other related data such as protein sequence. However, a number
of individuals commented that the availability of data from GenBank et al.
eliminates the need for inclusion of such information in our database.
Referencing to such information was considered important, as would be expected.
The receipt and inclusion of sequence data by AIMS must be coordinated
with the Arabidopsis sequencing efforts now being initiated world-wide.
Since these may include analysis of such data, it seems clear that the
results of such analysis would represent very desirable data for AIMS.
The development of AIMS will be coordinated fully with these efforts.
The software/Hardware section was designed to gather information about
the availability of user-end software/hardware resources and the need
for specialized software packages. Of the 55 respondents, 40 indicated
to have MAC and 24 respondents indicated to have IBM PC compatibles.
Out of those who have IBM PC compatibles, 15 respondents have VGA.
Most of the respondents (47) have accesses to email. The
following software tools were suggested for inclusion in the AIMS database:
1. Comparing genetic maps
2. Comparing physical maps
3. Develop consensus maps
4. Finding overlaps, lengths, and relative positions
5. Software for sequence search and comparison
A few people (14) thought software for protein sequence analysis
should be included.
F. Impact on AIMS Development and Development Schedule
Several areas will receive special attention as indicated in the above
discussion. The basic information about seed and DNA stocks is considered
important by everyone. Furthermore, if detailed data on these subjects can be
incorporated into the database, as suggested as being desirable from the
results, the database can become a viable research tool for Arabidopsis
laboratories. Likewise, the efficient acquisition, by users, of information
about potential stocks to be ordered from the collection should be facilitated.
The development of mapping facilities is also seen as an important goal.
This will be coordinated with existing large laboratories conducting mapping
efforts and with individuals designated as curators of mapping data. In the
long term, it may be desirable to develop full capability for the analysis of
mapping data by AIMS. The schedule for the implementation of AIMS which was
published in our introductory electronic and direct mail communications,
still achievable, at this time. Our plan remains to have a prototype system,
containing basic information about seed and DNA stocks, on line for e-mail
this spring - probably as early as April. The full system should be
operational in the fall of 1992. We will keep you fully informed by e-mail
and other means on our progress.
III. SURVEY RESULTS
Number of questionnaires processed: 55
Number of questionnaires by country:
The Netherlands: 2
A. TYPES OF DATA
The following types of data were ranked from 1 (Most Important) to 5
(Least Important) for their importance to the community.
i) Seed Stock Data
- special growth requirements
Responses: 55, Mean: 1.345455, Variance: 0.626116
016: 1 eg. thiamine, SA mutants.
- locus been cloned?
Responses: 54, Mean: 1.407407, Variance: 0.574760
- genetic map location
Responses: 55, Mean: 1.418182, Variance: 0.606942
016: 1 hopefully update, when maps are updated!
- ecotype of mutant lines
Responses: 55, Mean: 1.454545, Variance: 0.611570
More information about the Arab-gen