Dear Yeast Netters,

Francis Ouellette proposed a debate about *THE* yeast database

      Do we need only *ONE* yeast database ?

>::What would like to have in it?

>Some ideas (to get the ball rolling): the strain list (mutants with
>phenotype description, with pictures of colonies and/or cells),
>the gene list (Mortimer et al., ... on line), an EMBL/GenBank yeast
>database, the yeast Email directory (my plug ;-), a list of references
>with abtracts (in what format?).

      The most obvious answer is all of them, and even more.

      Let me explain. The databases need is in permanent evolution. Today I 
need sequences, tommorrow (in few months or in few years) I will need 
relationships between genes, or phenotypes etc etc. 

      However it seems to me very difficult to have a unique database
containing all the possible informations that can be asked today and
tommorrow. It is not a problem of hardware or software but a problem of
ownerships and maintenance.  

>::How would organize it?

>Like ACeDB (plug for Stanford group to get into discussion)?
>How would you want things linked together?  Without getting into the
>nitty gritty of the software running this, maybe some general ideas
>on format, updating the data and so on.  Would it be practictal
>to have different groups managing different databases, and another
>group integrating the different databases?  (this way ... you never
>lose "ownership" of your database). Or is that too limiting in the
>improvement on the integration of the data?

      My opinion is that the best is the connexion between
small specilized databases. For exemple: the LISTA2 database, will be (or is)
related to the Mortimer's genetic map (Linder, personnal communication). Small
specialized databases are more easier updated. 
      As for other small and specialized databases, all we need is to avoid
duplicated information. We have to establish connexions between them and the 
way to go from one the the other. By this way, any user can access an enormous
database. On the other hand it is not necessary to maintain
this enormous database, since physically there is many small ones, each
one updated by its specialized managers.

::Where would it run?

>Mac, DOS, Unix, VMS, some sort of window environment, on the Internet,
>gopher or some yet uninvented tool [please expand on this one :)?
>Lets not worry, "who" would be doing it, but maybe how it would be done.

      My experience with gopher is that, for the end user something like
a gopher will be the easiest way. But on the other hand gopher access 
is not so easy for everybody. 

      So the question is which are the small and specialized databases that
we need? My personnal need for today:

      I search a database like ListA2, in which I will have all 
the informations on the genetic map position of each gene, 
pointers to the EMBL databank  and to Swissprot to obtain the sequences, 
informations about the "next" or "previous" gene in the chromosome 
(if available).

      As you may know there is about 2000 S. cerevisiae entries (nuclear only)
in the latest release of  EMBL. However I feel that this number can be 
reduced to about 1500 entries (maybe less) if one eliminates all duplicated 
entries, and constructs all the possible contigs. For exemple it is obvious 
that we don not need any more all the entries of genes located on the 
chromosome III.

      However all this duplicated information may contain differences in 
sequences. These differences may be errors in sequencing as also polymorphism.
We need also to keep trace of these differences. So another specialised
database must contain pointers indicating which are the differences between
the duplicates and of course it must be connected to the others.

