Announcement: Sagittarius PIR-44 available via anonymous FTP
VICTOR B. STRELETS
STRELETS at SCRI.FSU.EDU
Fri Jun 16 09:02:10 EST 1995
SAGITTARIUS PIR-44 (April 1995) variant
***************************************
SAGITTARIUS PIR is a highly compact databank variant of original PIR
database designed to assist individual researchers and software developers
in utilization of sequence database information without huge storage space
requests. It contains custom compressed PIR information and C-written interface
which allow fast direct access to the stored information without total
decompressing of corresponding files. Starting from PIR-41 version, one
and the same databank files as well as interface C-file can be used on both
PC-compatibles and UNIX V computers (with forthcoming Mac interface version),
without any modifications. Interface supports all standard PIR Request Network
queries (i.e. get databank SEQ number by entry; for the defined databank SEQ
number, get specified information like: name, organism(s), keyword(s),
sequence, sequence features with coordinates etc.). In difference with PIR
Request Network, SAGITTARIUS PIR allows you to call PIR-contained information
directly from your C program, even on the personal computer separated from any
network. In addition, all numerical information like introns placement and
concrete feature coordinates are aquired by calling program in the form of
constants instead of text strings what simplifies (sub)sequence manipulations.
For even larger storage compactness and flexibility, SAGITTARIUS PIR was
realized in the form of separate file sets, where each file set contains
database information of independent type (i.e. sequences, entry indexes,
organisms etc.). On the particular computer, available configuration of the
PIR information could be easily changed as needed by the user itself without
any damage for retrievals of other types of stored information. For example, a
file set which contains protein sequences itself and their PIR entry indexes
(including reverse indexation arrays) takes less than 14Mb of disk space.
For PC-compatibles (with fortcoming version for UNIX/Xwindow), a dialog
shell is available which supports all standard PIR Request Network queries
plus homology searches, alignments etc.
SAGITTARIUS PIR is distributed freely (all databank file sets, interface
C-file, test/example program C-file, PC-shell executable) via anonymous FTP.
File sets and interface can be used/included in any commercially distributed
package without any restrictions. Consultations and advanced interface
variants (currently used to support fast effective database manipulations
in other SAGITTARIUS family packages) are available from the developers upon
request.
For now, SAGITTARIUS PIR compressed databank stores in custom compressed
form following original informational types (fields) of PIR database:
- database entry index
- accession number(s)
- other (non-PIR) database crossreference(s)
- protein name
- organism name(s)
- alternative protein name(s)
- keyword(s)
- superfamily name(s)
- gene name(s)
- map position(s)
- unusual start codon(s)
- intron(s) placement
- literature reference(s), including for each:
-journal or citation
-author(s)
-title
-free-format comment
- sequence feature(s)
- free-format comment
- protein sequence itself
For PIR-44, all bank files takes 37+ Mb on hard disk (23+ Mb in transport
ZIP-compressed form). Each original database informational field (i.e.
sequences, organisms, names, keywords etc.) is stored in separate file set
what allows the user to configure reduced bank variants by simply excluding
unnecessary information files from unpacking. For example, deletion of
literature references reduces the bank to only 27+ Mb. Core (minimal
configuration supported by available PC-shell) variant of databank files
includes only indexes and sequences. All more complete configurations could
be produced by simply adding (depacking from distributive) of corresponding
file sets.
List of distributive files with decompressed files description
**************************************************************
-------------------------------------------------------------------------------
ZipFile ZipSize Content Description
------------------------------------------------------\/ Core config part \/
CORE 12,542,278 Entry indexes + sequences itself
------------------------------------------------------\/ User-variable part \/
NAME 1,137,312 Sequence names
ORGANISM 533,940 Organisms
KEYWORD 322,655 Keywords
S_FAMILY 168,410 Superfamily classifications
CROSSREF 327,537 Other database crossreferences
FEATURE 1,091,777 Sequence features
GENE_MAP 29,621 Genetic map positions
ALT_NAME 220,725 Sequence alternative names
GENE 194,431 Genes
CODON 10,430 Unusual start codons
ACC_CODE 1,569,822 PIR accession codes
COMMENT 285,206 Sequence comments
INTRON 47,096 Intron(s) placement
REF_JOU 991,882 References core : references itself
REF_AUTH 1,779,989 Ref. extention : reference authors
REF_TITL 2,108,161 Ref. extention : reference titles
REF_COMM 63,120 Ref. extention : reference comments
------------------------------------------------------\/ Dialog shell for PC \/
PC_SHELL 194,641 PC-executable + two MAP-files (to \PIR)
------------------------------------------------------\/ Interface \/
INTERFAC 15,350 Interface and test program, C-files, PRJ file
-------------------------------------------------------------------------------
SAGITTARIUS PIR Data Bank Shell
*******************************
SAGITTARIUS PIR Automated Sequence Bank is a dialog shell for manipulation
of the compressed sequence database information with orientation on
MS DOS/Windows PC-compartibles, with installed hard disk optimizers
(like Smartdrive, Hyperdisk, Ncache etc.). P5 or 486 are recommended,
386/286 will be significantly slower but still OK.
The dialog data shell supports the following main operations:
- selection of sequences to bank buffer by
- dictionary-defined record for specified informational
field (name, source, keyword, feature etc.)
- user-defined context in specified informational
field (name, source, keyword, feature etc.)
- set of dictionary-defined records for different informational
fields (source, keyword, superfamily etc.)
- SEQ (non)perfect homology with user-defined short sequence
- store and retrieve buffer content (SEQ bank numbers and indexes)
between sessions
- output user-specified (buffer) SEQ data to disk files
- fast SEQ homology searches (for user-defined SEQ of length not more
than 50-100 residues, only 1 hour with full PIR bank on 486/33)
- fast subregion-sensitive pairwaise alignments (user-defined
sequence with buffer SEQ's or full bank)
- easy data access from user programs (C) as a support for
applications development
SAGITTARIUS data bank files are usually filled out by current available
PIR database information by distributors only. Distributive variant includes
ready-for-use informational files, interface and executables - all in
compressed form.
-----------------------------------------------------------------------------
SAGITTARIUS PIR is available by anonymous FTP from:
FTP.SCRI.FSU.EDU, directory /pub/genetics/pir/
SAGITTARIUS PIR is also available by anonymous FTP from some
of the well-known bio-servers (IUBIO etc.).
----------------------------------------------------------------------------
Installation on UNIX system
***************************
All decompressed SAGITTARIUS databank files must be placed in one and
the same directory which name should be correctly specified in the interface
C-file (first strings). Placement of the 'X' symbol in the first position of
the corresponding text string (instead of '/') will force interface to carry
out formal check for databank files presence.
Installation on PC
******************
If you plan to use PC-shell, all decompressed SAGITTARIUS databank files
must be placed in the directory \PIR on any (but one and the same) logical
drive. Otherwise all decompressed SAGITTARIUS databank files must be placed
in one and the same directory which name should be correctly specified in
the interface C-file (first strings). Placement of the 'X' symbol in the first
position of the corresponding text string (instead of some drive letter) will
force interface to carry out 1) search of corresponding directory on all
available logical drives and 2) formal check for databank files presence.
Bank executable BANK.EXE may be placed in any directory on any logical
drive. Important: .MAP-files contained in the same PC-shell compressed file
should be moved into the directory \PIR where other databank files are placed.
It is highly reccomended to run BANK.EXE from directory (and/or logical
drive) other than data location to avoid random bank files structure
damage. Bank is oriented on file-server data accession and can find
\PIR directory (and test them for correct data configuration) on
any logical drive.
----------------------------------------------------------------------------
SAGITTARIUS PIR is a FREE DOMAIN software.
This package (with compressed data files) can be redistributed
freely without any limitations but only free of charge and for
non-commercial usage. No changes in data files and/or executables
are allowed.
You may include compressed SAGITTARIUS datafiles in your application
packages freely even in the case of any commercial usage.
--------------------------------------------------------------
For helpful comments and discussions please contact
Dr. Victor B. Strelets (strelets at scri.fsu.edu)
Computational Genetics and Biophysics,
Supercomputer Computations Research Institute,
FSU B-186, Tallahassee, FL 32306-4052, USA
---------------------------------------------------------------
For control purposes you may use following info about the SAGITTARIUS PIR
distributive files (with information about compressed files):
CORE ZIP 12,542,278
SEQ0 BAN 330,572
SEQ BAN 10,911,789
DIC BAN 330,888
DIC2 BAN 106,496
IND BAN 661,144
IND0 BAN 330,572
IND2 BAN 331,776
0IND REV 330,572
1IND REV 330,572
NAME ZIP 1,137,312
NAM BAN 1,135,808
NAM0 BAN 330,572
NAM2 BAN 210,944
0NAM REV 209,100
1NAM REV 330,564
ORGANISM ZIP 533,940
SOU BAN 116,480
SOU0 BAN 330,572
SOU1 BAN 84,824
SOU2 BAN 26,624
0SOU REV 26,020
1SOU REV 504,612
KEYWORD ZIP 322,655
KW BAN 20,816
KW0 BAN 330,388
KW1 BAN 121,476
KW2 BAN 6,144
0KW REV 5,672
1KW REV 323,992
GENE ZIP 194,431
GENE BAN 89,792
GENE0 BAN 279,284
GENE1 BAN 44,160
GENE2 BAN 36,864
0GENE REV 35,644
1GENE REV 58,816
ALT_NAME ZIP 220,725
ANAM BAN 151,432
ANAM0 BAN 330,368
ANAM1 BAN 45,028
ANAM2 BAN 38,912
0ANAM REV 36,956
1ANAM REV 58,772
S_FAMILY ZIP 168,410
SFAM BAN 61,360
SFAM0 BAN 278,972
SFAM1 BAN 29,064
SFAM2 BAN 14,336
0SFAM REV 14,020
1SFAM REV 134,296
ACC_CODE ZIP 1,569,822
AC BAN 751,104
AC0 BAN 330,572
AC1 BAN 377,236
AC2 BAN 376,832
0AC REV 375,552
1AC REV 375,552
CODON ZIP 10,430
CDN BAN 216
CDN0 BAN 271,464
CDN1 BAN 184
CDN2 BAN 2,048
0CDN REV 108
1CDN REV 4,668
FEATURE ZIP 1,091,777
FT BAN 323,616
FT0 BAN 309,148
FT1 BAN 328,440
FT2 BAN 53,248
FTN BAN 1,132,072
FTN0 BAN 309,148
FTN1 BAN 328,440
0FT REV 53,204
1FT REV 270,768
GENE_MAP ZIP 29,621
MAP BAN 35,888
MAP0 BAN 276,328
COMMENT ZIP 285,206
CC BAN 524,056
CC0 BAN 310,212
CROSSREF ZIP 327,537
CR BAN 416,416
CR0 BAN 330,572
INTRON ZIP 47,096
INTR0 BAN 278,936
INTR1 BAN 60,204
REF_JOU ZIP 991,882
REF BAN 1,215,520
REF0 BAN 330,572
REF1 BAN 456,592
REF_AUTH ZIP 1,779,989
AUT BAN 2,707,832
AUT0 BAN 330,572
AUT1 BAN 456,592
REF_TITL ZIP 2,108,161
TITLE BAN 3,419,552
TITLE0 BAN 330,564
TITLE1 BAN 425,216
REF_COMM ZIP 63,120
REFCOM BAN 66,488
REFCOM0 BAN 330,520
REFCOM1 BAN 37,784
PC_SHELL ZIP 194,641
BANK EXE 535,134
BANK1 MAP 165,600
BANK2 MAP 165,600
INTERFAC ZIP 15,350
INTERF C 26,237
BANK H 12,163
BANK0 H 50
INTERF H 6,048
BANKTEST C 9,795
BANKTEST PRJ 5,353
-----------------------------------------------------------------
Standard disclaimer:
Author(s) will in no way be held liable for any loss of profit or
any other commercial damage including but not limited to special,
incidental, consequential or other damages from use of this
package. You may use them only with the understanding that
you use it at your own risk and that your use of the software
and datafiles is your agreement to this disclaimer.
More information about the Bio-soft
mailing list