database duplication and update inefficiencies: workshop at Waterville

Leslie Taylor ltaylor at socrates.ucsf.edu
Tue Jul 6 21:40:06 EST 1993


    A 90 minute workshop for sequence software designers interested
in the duplicate sequence database problem or the database update problem,
will be held at the meeting in Waterville Valley August 17 - 22. 

CONTENTS OF THIS POSTING:
    20 LINE ANNOUNCEMENT OF THE WORKSHOP:
    25 LINE LOW RESOLUTION AGENDA OUTLINE
    65 LINE ELABORATED AGENDA OUTLINE

    If you are excited about some of the new sequence software 
available but are frustrated by the fact that a separate copy of the 
sequence databases are needed for nearly each application, or
if you are wishing you could do a nightly update of your sequence 
databases without massive reformatting and indexing of the entire 
database, then you should attend.

    SOFTWARE DEVELOPERS who are interested in addressing these
problems will be proposing some solutions at this Workshop in
Waterville.

---------------------------
 title:"Converging on a sequence database format different packages could share"

 summary:
       This workshop will elicit strategies for encapsulating database
 formats away from application packages to permit independent evolution
 of each, and to permit economical sharing of databases between packages.
 Special attention will be focused on sharing of governmentally distributed
 databases and updates between commercially distributed and academically
 distributed software packages.
---------------------------

LOW RESOLUTION AGENDA OUTLINE

1.    DEFINING THE PROBLEMS WITH REAL WORLD EXAMPLES in labs. (15 minutes)
        Multi-vendor/multi-application sequence database duplication
        Updates of every entry instead of simply just new and changed entries.
        Users and developers with these problems will describe their trials.

2.    GOAL DEFINITIONS( 15 minutes ): REWORDING PROBLEMS INTO A LIST OF GOALS
        Long Term 
            User's access to fresh sequence databases from any platform
            Application Programmer Interfaces to EVOLVING sequence databases
        Short Term 
            User's access to fresh sequence databases
            Application Programmer Interfaces to fresh sequence databases

3.    PROPOSED STRATEGIES( 1 hour )PROPOSE SPECIFICATIONS FOR EACH LISTED GOAL

            e.g.  (Dave Wheeler's proposal, now being evaluated at NCBI)
            make sequences in ENTREZ be pointers to NCBI/"Search" formatted
                    sequence which is blast-able and soon to be fasta-able.

            ?Does anyone currently share or have a design to share SYBASE 
                sequences with fast alignment searches?   

-------------------------
Same 3 points outlined above, but this time slightly elaborated 
SLIGHTLY ELABORATED IF NOT MUCH HIGHER RESOLUTION AGENDA OUTLINE

1.DEFINING THE PROBLEMS

    The current sequence database access problems in differently sized and 
differently equipped labs can be boiled down to two classes of challenge:

    1.1 The multi-vendor /multi-application problems: 
        user comfort, feature evolution, and disk space

    1.2 The Update Problems: 
        frequency (Do I have to wait for the CD in the mail? )
        time-cost (Must we update all data or just what is new?)


    TWO PRIME EXAMPLES of the PROBLEMS DEFINED ABOVE:

    An exemplary multi-application challenge is the situation at NCBI
where the ENTREZ keyword queries are using a different database than
the blast sequence alignment probes.
    
    An exemplary multi-vendor challenge is the situation
we have where 100 labs surveyed at UCSF, many of whom have bought 
commerical sequence analysis packages for their  MAC's and PC's have no way
of updating their local databases other than by CD, neither are they using 
the superior user-interfaces on their local machines to access the 
Campus wide database copies which are updated nightly.



2.    GOAL DEFINITIONS: FIRST OPTIMAL LONG TERM, THEN IMPLEMENTABLE SHORT TERM

    2.1 User's LONG TERM wish list
        Packages that can, in REALtime, use any of several databases
        Network updates with just what is new.
        Permitting fast access to user-specified subsets of data

    2.2 Application Programmer's LONG TERM wish list
        Portable robust fast documented database API library with each database


    2.3 User's SHORT TERM Requirements

    The minimal goal for the user is AT LEAST ONE frequently 
updateable database format for people who want to run more than 
one package on their machine and for people who want to run different 
packages sometimes on different CPU's while updating their sequence databases 
just once.

    2.4 Application Programmer's SHORT TERM Requirements

    The minimal goal is a database interface efficient and effective with 
different applications: keyword queries as well as very fast alignment searches.
This flexibility will be of little value without stable release schedules 
and good documentation for the database interface and wide user access to
the database updates.


3.    PROPOSED STRATEGIES( 1 hour )PROPOSE SPECIFICATIONS FOR EACH LISTED GOAL

	Any developers with strategies meeting the listed requirements
are encouraged to briefly present their work.    Each strategy should
begin with a concrete prediction of a believable schedule.    Implementations  
currently in use and those implementable in the next 12 months will have the
top priority.  Presentations should be 5-10 minutes and organized
around the problem definitions above.


Leslie Taylor   Sequence Analysis Service           email:ltaylor at cgl.ucsf.edu 
Computer Graphics Lab UCSF 		            office: (415) 476-5379 
Box 0446 Room S926 			            fax:    (415) 502-1755
513 Parnassus Avenue San Francisco, CA 94143-0446
-- 
Leslie Taylor   Sequence Analysis Service           email:ltaylor at cgl.ucsf.edu 
Computer Graphics Lab UCSF 		            office: (415) 476-5379 
Box 0446 Room S926 			            fax:    (415) 502-1755
513 Parnassus Avenue San Francisco, CA 94143-0446



More information about the Bionews mailing list