Dave Matthews sent the following message to Graingenes. Recommended
-------- Original Message --------
Subject: USDA animal bioinformatics workshop
Date: Wed, 6 Nov 2002 11:57:23 -0500 (EST)
From: Dave Matthews <matthews at greengenes.cit.cornell.edu>
To: Grains mailgroup <grains at greengenes.cit.cornell.edu>
The USDA is holding an online public workshop to plan the future of
animal genome databases. Many of the issues being discussed are
equally relevant to plant bioinformatics, so you might be interested
in looking at it. A sample posting from this morning is below.
Currently there's a panel discussion going on, then on Nov 11 it will
be opened up for public comment. The URL is
Date: Wed, 06 Nov 2002 08:00:30 -0600
From: David Adelson <david.adelson at tamu.edu>
To: <bioinfo-panelists at mail.ahc.umn.edu>
Subject: [Bioinfo-panelists] Database, what kind?
We have been asked to discuss animal genome databases, so let's get
started. I will go out on a limb and make a few assertions, some of
which are probably obvious, some of which are certainly wrong. Most
of these points are directed at where we need to go, not necessarily
at where we are or should be now.
1. A relational database is a given, because of fiscal constraints
either MySQL or PostgreSQL.
2. Since the highest resolution genome map is a genome sequence,
whatever database format is chosen for current needs has to be able
to support a sequence database.
3. There are two good models for us to emulate but not duplicate:
NCBI and UCSC. Neither of them is perfect for comparative genomics. The
primary key (sequence accessions) will still be NCBI generated for all species.
4. For a sequence database, comprehensive annotation is the ultimate
goal, where every base pair is a potential annotation site.
5. Map data will ultimately be converted to sequence annotation.
6. Expression data will become annotation.
7. Functional data is obvious annotation, more on that later.
8. Other kinds of annotation include promoters, splice sites,
polymorphism, potential stem/loop structures.
9. There is already a good first attempt at this type of
architecture, called GMOD (Generic Model Organism Database,
http://www.gmod.org/ ). This could serve as an off the shelf core
that could be customized and improved for any species, thus ensuring
10. Different genome sequence databases will be linked by sequence
similarity, but more significantly by equivalent structures and
functions. Right now the focus is mapping and synteny.
11. Back to functional annotation. An ontology is a given, such as
the one used by the gene ontology consortium. We need to make sure
that the ontology is based on accepted terms and is evidence based.
Perhaps most critical, any ontology has to have a dictionary so that
users can actually use it to formulate queries.
12. How we view the data or how we can choose to view the data is
important. The UCSC track based viewer is great, but not if you want
to look at
a whole chromosome or a big chunk of a chromosome. There are a number of
different ways genome coordinate information, functional annotation
and expression data can be visualized, both for a single genome and
to compare across genomes. This is a truism, but for a database to be
useful it has to be user friendly.
13. Curation is a requirement, as is distributed annotation. How to
reconcile the two is the problem.
14. I have not even begun to think about how population data for
quantitative traits can be incorporated as annotation for a sequence
database, but that is clearly a topic that needs to be addressed.
Enough from me for now.
David L. Adelson, Ph.D.
Associate Professor - Animal Genomics
Dept. of Animal Science
Dept. of Veterinary Anatomy and Public Health
Faculty of Genetics
Animal Science Dept.
Texas A&M University
TAMU - 2471
College Station,TX 77843-2471