Sequence database shell for PC: PIR-36 compressed variant

STRELETS at SCRI.FSU.EDU STRELETS at SCRI.FSU.EDU
Wed Jun 9 08:19:58 EST 1993


	SAGITTARIUS/PIR DataBanks Information
	*************************************
  
  SAGITTARIUS Automated Sequence Bank is a dialog shell for
object-oriented compression, storage and manipulation of sequence
database information with orientation on MS DOS PC-compartibles 
(386 or 486 recommended) with installed hard disk optimisators
(like Hyperdisk, Ncache, Smartdrive and others). Currently it's 
oriented on GENBANK and PIR databases.

  Dialog data shell includes following main possibilities:
   - selection of sequnces to bank buffer by
        - dictionary-defined record for specified informational
          field (name, source, keyword, feature etc.)
        - user-defined context in specified informational
          field (name, source, keyword, feature etc.)
        - set of dictionary-defined records for main informational
          fields (source, keyword, superfamily etc.)
        - SEQ (non)perfect homology with user-defined short sequence
   - store and retrieve buffer content (SEQ bank numbers and indexes)
     between sessions
   - output user-specified (buffer) SEQ data to disk files
   - fast SEQ homology searches (for user-defined SEQ of length
     not more 50-100 positions, only 1 hour with full bank on 386/33)
   - fast subregion-sensitive pairwaise alignments (user-defined
     sequence with buffer SEQ's or full bank, only some hours with
     full bank on 386/33)
   - easy data access from user programs (C) as a support for
     applications development 

  For today PIR-derived compressed databank (PIR-36, 31 March 1993) 
stores following original database informational fields:
	  - database entry index 
	  - accession number(s) 
	  - other (non-PIR) databases crossreference(s)
	  - protein name 
	  - organism name 
	  - alternative protein name(s)
	  - keyword(s)
	  - superfamily name
	  - gene name
	  - organism host name
	  - map position
	  - unusual codon(s)
	  - intron(s) placement
	  - references, including for each:
	  	-journal or citation 
	        -author(s)
	        -title
	        -free-format comment
	  - feature(s)
	  - free-format comment
	  - protein sequence

  For PIRr36, all bank files takes 30Mb on hard disk (15Mb in ZIP-compressed 
form). Each original database informational field stored in separate files 
set what allows user to build reduced bank variants. For example,
deletion of literature references reduces bank to only 23Mb. Core (minimal 
configuration) variant of databank files includes only indexes and sequences. 
All higher variants produced by adding (depacking from distributive)
corresponding file sets.


  SAGITTARIUS PIR is available by anonymous FTP from:

 - FTP.SCRI.FSU.EDU, directory pub/genetics/pir/

	



More information about the Biomatrx mailing list