This note briefly describes the first proposal for a NEAR term solution
to the duplicate database problem. "Converging on a Sequence Database Format
Different Package Could Share" is the software developers' topic for
3pm Thursday August 19th in Waterville Valley.
If you are new to this discussion, the motivation for the
sharing of databases is outlined at the end.
So far, JUST ONE CONCRETE PROPOSAL IS ON THE TABLE. Please send
ahead brief descriptions of alternatives so people can prepare, or if you
plan to wait until the 19th, please take this description as an example
of a technical starting point considered concrete and implementable.
***********
BRIEF DESCRIPTION OF THE WHEELER PROPOSAL:
Dave Wheeler has proposed that the NCBI distribute a form of
the "ENTREZ" package (software and data) that can share sequence data with
the "SEARCHfmt" data. The change simply replaces the sequence information in
the "seq-data" field in (ASN.1 formatted) ENTREZ data with a POINTER
to the corresponding sequence information in the SEARCHfmt database.
The SEARCHfmt data was already proposed to contain the "giim" sequence
indentifiers that point to entries in ENTREZ so no modifications to
the SEARCH database would be necessary.
Glossary and references associated with Wheeler's proposal:
SEARCHfmt: A format used by the blast server and usable by fasta.
Use anonymous ftp to ncbi.nlm.nih.gov in directory pub/searchfmt
OR GOPHER to National Center for Biotechnology Information (NCBI)
ENTREZ: Interface for retrieving sequences and references on net or CDROM
Use anonymous ftp to ncbi.nlm.nih.gov in directory entrez/docs
OR GOPHER to National Center for Biotechnology Information (NCBI)
END OF BRIEF DESCRIPTION of the proposal
***********
REMINDER OF WHY WE WANT TO CONVERGE ON A DATABASE TO SHARE BETWEEN PACKAGES?
WHY converge on a database to share between packages?
1. So MAC and PC users could download for any of their packages
(or have client/server network access to ) fresh data nightly for
any of their packages.
2. So all users may have the advantage of multiple application packages
accessing the databases without having to maintain separate copies for each
package.
Why use multiple application packages currently in existence?
Because some have superior user interfaces.
Because none is comprehensive over all manipulations.
Because some require bigger CPU's.
Why permit future products to share databases with older products?
To smooth upgrade paths for user interfaces and applications.
To permit developers to concentrate on UI and bio-analyses.
Why decouple User Interface and low level database software development?
To permit developers to stick to their area of expertise.
To permit sequence databases to catch up technologically.
Leslie Taylor Sequence Analysis Service email:ltaylor at cgl.ucsf.edu
Computer Graphics Lab UCSF office: (415) 476-5379
Box 0446 Room S926 fax: (415) 502-1755
513 Parnassus Avenue San Francisco, CA 94143-0446
--
Leslie Taylor Sequence Analysis Service email:ltaylor at cgl.ucsf.edu
Computer Graphics Lab UCSF office: (415) 476-5379
Box 0446 Room S926 fax: (415) 502-1755
513 Parnassus Avenue San Francisco, CA 94143-0446