ANNOUNCING SRS release 3.1
etzold at embl-heidelberg.de
etzold at embl-heidelberg.de
Tue Sep 14 06:12:55 EST 1993
ANNOUNCEMENT
Release 3.1 of SRS (Sequence Retrieval System) is now available to anyone
inside the academic scientific community free of charge. Commercial
organizations should contact the author (for addresses and contact please see
the CONTACT section of this document).
DESCRIPTION
Most of our knowledge about proteins and genes is presently stored in a
variety of libraries with mostly flat file format where simple ASCII files
contain entries in sequential order. The entry structure of varying complexity
is divided into individual data fields which can be of free text format or
more often contain specialized information such as keywords or authors. The
entries of many of these libraries provide by various mechanisms
cross-references to those of other libraries.
SRS (Sequence Retrieval System) is an information indexing and retrieval
system designed for libraries with a flat file format. SRS supports the data
structure of these libraries by providing special indices for, eg,
implementing lists of subentities (e.g. feature tables). Indexing of
cross-references allows a complete network of libraries to be built. In the
network an entry from one library can be linked to every other library either
directly or with a succession of single links between neighboring,
cross-referenced, libraries.
A language ODD (Object, Design and Definition) has been designed and developed
to structure and define the data to cope with different library formats
subject to continuous change. Two levels of ODD programming allow for
defining data structures and the data themselves. An ODD compiler makes the
data available for the C programs that retrieve or build indices. A flexible
parser, programmed by an extended version of Backus-Naur format, has been
developed for the extraction of keywords from data-fields. Both the ODD
language and the programmable parser are required by SRS since all integrated
libraries are left in their native state and original formats.
DISTRIBUTION
------------------------------------------------------------------
VAX/VMS & OpenVMS
1. anonymous ftp (binary mode) to ftp.embl-heidelberg.de
directory: /pub/software/vax/srs files: srs3_1.bck
Important!!! since the VMS backup will be corrupted after the transfer get
the file "relnotes3_1.doc" with instructions for reformatting the backup
file.
2. anonymous (binary mode) ftp to biomed.uio.no
directory: disk_1:[aftp] files: srs3_1.bck
Additional files:
fixrec.c C source to produce fixrec.exe which fixes
the record length of the save set if you have
any problems.
fixrec.com DCL script that does the same as the above.
-----------------------------------------------------------------
U**X
1. anonymous (binary mode) ftp to ftp.embl-heidelberg.de
directory: /pub/software/unix/srs files: srs3_1.tar.Z
2. gopher to gopher.embl-heidelberg.de port 70
same directory as above
3. gopher to Norwegian EMBnet node (biomaster.uio.no) port 70
% gopher biomaster.uio.no 70
Name=EMBnet: Norway Biotechnology Centre of Oslo (BIOMASTER)
Type=1
Port=70
Path=1/
Host=biomaster.uio.no
Follow these menu items:
1. About the Norwegian EMBnet node
9. SRS is here
1. srs3_1.tar.Z
Also, the UNIX distribution will be available via anonymous ftp from
nic.switch.ch presumably in the directory /mirror/embnet-ch/software
-----------------------------------------------------------------------
Note that the UNIX and the VMS releases are equivalent!
RELEASE NOTES
The Unix port has the same user interface as the VMS version and the same
functionality, but is lacking the online help.
Release 3.1 remains compatible with indices built with older releases.
Indices cannot yet be shared amongst different operating systems (exeptions:
VMS-OpenVMS, IRIX-SunOS-Solaris)
In the distribution of SRS 3.1 the following database are supported:
Databank name format supported where to get it from
SWISSPROT (1) GCG EMBL
PIR (1) native MIPS
EMBL(1) GCG EMBL
GENBANK (1) GCG NCBI
NRL3D (1) native MIPS
PDB (1) native EMBL
HSSP (1) native EMBL
PROSITE (1) native EMBL
PROSITEDOC (1) native EMBL
BLOCKS (1) native NCBI
EPD (1) native NCBI
ECD (1) native EMBL
OMIM (1) native Welch Lab
MIMMAP (1) EMBL EMBL
ENZYME (1) native EMBL
REBASE (1) native EMBL
CPGISLE (1) EMBL EMBL
SEQANALREF (2) EMBL EMBL
SEQANALRABS (2)
LIT native EMBL
LITT EMBL expasy.chuge.ch
NAKAI GCG GCG
LIMB native EMBL
FASTA (1) GCG
BLAST (1) native
PROFILE (1) GCG
SRS indexes the libraries in the native format - reformatting is NOT required.
All indices for a library need typically 20% of the size of the library
itself.
Indexed are fields such as "Definition", "Title", "Authors", "Reference",
"Accession"; note that the feature tables in the sequence libraries are
indexed in a way that a search on that index retrieves not the complete entry
but ONLY the feature (eg, intron, transmembrane helix). The sequence of
retrieved features can be extracted if the feature is only defined by begin
and end position; more complicated "locations" as found in EMBL and GenBank
(join, reverse...) will be supported by the next release.
SRS allows indexing of crossreferences and thus the construction of a
library network; all libraries listed above marked with "(1)" are part
of that network - any entry or set of these from one library can be linked
to any other library within the network (provided a link exists); multistep
links (eg, PDB->EMBL) are resolved automatically into a succession of
single links (eg, PDB->SwissProt->EMBL). All links are bidirectional.
The databanks Seqanalref and Seqanalrabs (abstracts of articles listed in
Seqanalref) marked with "(2)" are linked in a separate network.
Output files from Fasta, Blast, Profilesearch can be converted to
indexable library and linked to the sequence libraries; This allows
navigating from reported homologies to the sequence library searched and to
all other libraries marked "(1)" such as Prosite, Hssp, PDB... This facility
is still in a somewhat experimental stage and will be improved soon.
SRS is well adapted to the GCG package (writes sequences with CGC format,
reads and writes the FOSN format) but can also used in other environments
since it supports the PIR sequence format as well (others will be added).
Two new commands make the maintenance of SRS quite simple:
srscheck: checks all indices and writes a shell script with all commands
needed for updating the system
srsupdate: executes that shell-script (or DCL file)
SRS is flexible since all interior information is programmed in a special
purpose information definition language (ODD - Object Definition and Design)
Formats can be changed or libraries added without recompilation of the
C-code. A Maintenance manual describes in detail the use of that language.
REFERENCES
Etzold T. and Argos P., SRS - an indexing and retrieval tool for flat
file data libraries. Comput. Appl. Biosci. 9:49-57(1993).
Etzold T. and Argos P., Transforming a set of biological flat file
libraries to a fast access network. Comput. Appl. Biosci. 9:59-64(1993).
ACKNOWLEDGEMENTS
The U**X port of the SRS was made possible thanks to the collaboration
and hard work of Lukas Rosenthaler and Reinhard Doelz from the Swiss
EMBnet node.
CONTACTS
Thure Etzold etzold at embl-heidelberg.de, Tel: 0049-6221-387529
Rodrigo Lopez rodrig
More information about the Bio-soft
mailing list