database duplication and update inefficiencies: workshop at Waterville
Leslie Taylor
ltaylor at socrates.ucsf.edu
Tue Jul 6 21:40:06 EST 1993
A 90 minute workshop for sequence software designers interested
in the duplicate sequence database problem or the database update problem,
will be held at the meeting in Waterville Valley August 17 - 22.
CONTENTS OF THIS POSTING:
20 LINE ANNOUNCEMENT OF THE WORKSHOP:
25 LINE LOW RESOLUTION AGENDA OUTLINE
65 LINE ELABORATED AGENDA OUTLINE
If you are excited about some of the new sequence software
available but are frustrated by the fact that a separate copy of the
sequence databases are needed for nearly each application, or
if you are wishing you could do a nightly update of your sequence
databases without massive reformatting and indexing of the entire
database, then you should attend.
SOFTWARE DEVELOPERS who are interested in addressing these
problems will be proposing some solutions at this Workshop in
Waterville.
---------------------------
title:"Converging on a sequence database format different packages could share"
summary:
This workshop will elicit strategies for encapsulating database
formats away from application packages to permit independent evolution
of each, and to permit economical sharing of databases between packages.
Special attention will be focused on sharing of governmentally distributed
databases and updates between commercially distributed and academically
distributed software packages.
---------------------------
LOW RESOLUTION AGENDA OUTLINE
1. DEFINING THE PROBLEMS WITH REAL WORLD EXAMPLES in labs. (15 minutes)
Multi-vendor/multi-application sequence database duplication
Updates of every entry instead of simply just new and changed entries.
Users and developers with these problems will describe their trials.
2. GOAL DEFINITIONS( 15 minutes ): REWORDING PROBLEMS INTO A LIST OF GOALS
Long Term
User's access to fresh sequence databases from any platform
Application Programmer Interfaces to EVOLVING sequence databases
Short Term
User's access to fresh sequence databases
Application Programmer Interfaces to fresh sequence databases
3. PROPOSED STRATEGIES( 1 hour )PROPOSE SPECIFICATIONS FOR EACH LISTED GOAL
e.g. (Dave Wheeler's proposal, now being evaluated at NCBI)
make sequences in ENTREZ be pointers to NCBI/"Search" formatted
sequence which is blast-able and soon to be fasta-able.
?Does anyone currently share or have a design to share SYBASE
sequences with fast alignment searches?
-------------------------
Same 3 points outlined above, but this time slightly elaborated
SLIGHTLY ELABORATED IF NOT MUCH HIGHER RESOLUTION AGENDA OUTLINE
1.DEFINING THE PROBLEMS
The current sequence database access problems in differently sized and
differently equipped labs can be boiled down to two classes of challenge:
1.1 The multi-vendor /multi-application problems:
user comfort, feature evolution, and disk space
1.2 The Update Problems:
frequency (Do I have to wait for the CD in the mail? )
time-cost (Must we update all data or just what is new?)
TWO PRIME EXAMPLES of the PROBLEMS DEFINED ABOVE:
An exemplary multi-application challenge is the situation at NCBI
where the ENTREZ keyword queries are using a different database than
the blast sequence alignment probes.
An exemplary multi-vendor challenge is the situation
we have where 100 labs surveyed at UCSF, many of whom have bought
commerical sequence analysis packages for their MAC's and PC's have no way
of updating their local databases other than by CD, neither are they using
the superior user-interfaces on their local machines to access the
Campus wide database copies which are updated nightly.
2. GOAL DEFINITIONS: FIRST OPTIMAL LONG TERM, THEN IMPLEMENTABLE SHORT TERM
2.1 User's LONG TERM wish list
Packages that can, in REALtime, use any of several databases
Network updates with just what is new.
Permitting fast access to user-specified subsets of data
2.2 Application Programmer's LONG TERM wish list
Portable robust fast documented database API library with each database
2.3 User's SHORT TERM Requirements
The minimal goal for the user is AT LEAST ONE frequently
updateable database format for people who want to run more than
one package on their machine and for people who want to run different
packages sometimes on different CPU's while updating their sequence databases
just once.
2.4 Application Programmer's SHORT TERM Requirements
The minimal goal is a database interface efficient and effective with
different applications: keyword queries as well as very fast alignment searches.
This flexibility will be of little value without stable release schedules
and good documentation for the database interface and wide user access to
the database updates.
3. PROPOSED STRATEGIES( 1 hour )PROPOSE SPECIFICATIONS FOR EACH LISTED GOAL
Any developers with strategies meeting the listed requirements
are encouraged to briefly present their work. Each strategy should
begin with a concrete prediction of a believable schedule. Implementations
currently in use and those implementable in the next 12 months will have the
top priority. Presentations should be 5-10 minutes and organized
around the problem definitions above.
Leslie Taylor Sequence Analysis Service email:ltaylor at cgl.ucsf.edu
Computer Graphics Lab UCSF office: (415) 476-5379
Box 0446 Room S926 fax: (415) 502-1755
513 Parnassus Avenue San Francisco, CA 94143-0446
--
Leslie Taylor Sequence Analysis Service email:ltaylor at cgl.ucsf.edu
Computer Graphics Lab UCSF office: (415) 476-5379
Box 0446 Room S926 fax: (415) 502-1755
513 Parnassus Avenue San Francisco, CA 94143-0446
More information about the Bionews
mailing list