Sequence database shell for PC: PIR-36 compressed variant
STRELETS at SCRI.FSU.EDU
STRELETS at SCRI.FSU.EDU
Wed Jun 9 08:19:58 EST 1993
SAGITTARIUS/PIR DataBanks Information
*************************************
SAGITTARIUS Automated Sequence Bank is a dialog shell for
object-oriented compression, storage and manipulation of sequence
database information with orientation on MS DOS PC-compartibles
(386 or 486 recommended) with installed hard disk optimisators
(like Hyperdisk, Ncache, Smartdrive and others). Currently it's
oriented on GENBANK and PIR databases.
Dialog data shell includes following main possibilities:
- selection of sequnces to bank buffer by
- dictionary-defined record for specified informational
field (name, source, keyword, feature etc.)
- user-defined context in specified informational
field (name, source, keyword, feature etc.)
- set of dictionary-defined records for main informational
fields (source, keyword, superfamily etc.)
- SEQ (non)perfect homology with user-defined short sequence
- store and retrieve buffer content (SEQ bank numbers and indexes)
between sessions
- output user-specified (buffer) SEQ data to disk files
- fast SEQ homology searches (for user-defined SEQ of length
not more 50-100 positions, only 1 hour with full bank on 386/33)
- fast subregion-sensitive pairwaise alignments (user-defined
sequence with buffer SEQ's or full bank, only some hours with
full bank on 386/33)
- easy data access from user programs (C) as a support for
applications development
For today PIR-derived compressed databank (PIR-36, 31 March 1993)
stores following original database informational fields:
- database entry index
- accession number(s)
- other (non-PIR) databases crossreference(s)
- protein name
- organism name
- alternative protein name(s)
- keyword(s)
- superfamily name
- gene name
- organism host name
- map position
- unusual codon(s)
- intron(s) placement
- references, including for each:
-journal or citation
-author(s)
-title
-free-format comment
- feature(s)
- free-format comment
- protein sequence
For PIRr36, all bank files takes 30Mb on hard disk (15Mb in ZIP-compressed
form). Each original database informational field stored in separate files
set what allows user to build reduced bank variants. For example,
deletion of literature references reduces bank to only 23Mb. Core (minimal
configuration) variant of databank files includes only indexes and sequences.
All higher variants produced by adding (depacking from distributive)
corresponding file sets.
SAGITTARIUS PIR is available by anonymous FTP from:
- FTP.SCRI.FSU.EDU, directory pub/genetics/pir/
More information about the Biomatrx
mailing list