New release of the SEQF database search routines

Dan Davison dbd at THEORY.BCHS.UH.EDU
Wed Jan 23 19:14:54 EST 1991


A collection of programs based on Minoru Kanehisa's SEQF library search
codes is now available for public use. Directions for retrieval from the
UH Gene-Server are given below; they will appear at IUBIO, the EMBL
File Server and FUNET shortly.

The codes have been worked over and now run on most Unix boxes, Crays,
and VMS.  Much work has been put into making the code as portable as
possible.  This does not (yet) extend to DOS compilers, though.  Don't
ask about the Mac until say 5 years after System 7 comes out...

The programs are:

This package consists of four programs for searching genetic sequence
libraries:

SN - Search Nucleotide, D/RNA query sequence against a nucleotide
     sequence library; 
SP - Search Protein, amino acid query sequence against a protein
     sequence library;
ST - Search Translated, amino acid query sequence against a nucleotide 
     sequence library with 3-frame translation;
SPR -Search Protein Reduced, amino acid query sequence against a
     protein sequence library, with the 20 aa alphabet reduced to 6
     letters on charge, hydrophobicity, and size characteristics.
SU - Search Unformatted, SN specially I/O hacked for the Cray which
     requires some care and feeding, partially documented in the code.
     It is about 55% faster than SN for the same problems.

These codes can be used to compare two sequences against each other;
the underlying algorithm is the Needleman-Wunsch-Sellers metric
alignment, in distance mode.

[Yes that's 5, but SU is only usable on non-Crays without some effort.]

SEQUENCE FILE FORMATS
                      
This code is designed to use most common formats; if you have a format
you want included contact dbd at one of the addresses below.

Supported formats include GenBank, EMBL/SwissProt, Bionet/ Intelli-
genetics/ Stanford, and straight ASCII.  The code should automatically
detect the proper type.  Note that GCG format and Staden code and
format is NOT supported at present.  If you have GCG files, try TOEMBL
in the GCG package for sequence file format conversion.

THE CREDITS
 
The code was written Minoru Kanehisa while with the Theoretical
Biology and Biophysics Group, Theoretical Divison, Los Alamos National
Laboratory, I/O and other modification by Dan Davison while at LANL
and the University of Houston.  Additional I/O improvements are due to
Hugh Nicholas of the Pittsburg Supercomputer Center (thanks!); some
last minute work by Ed Chen of the University of Houston.  The reduced
protein code search came out of discussions with Jim Ostell, now at
the National Center for Biotechnology Information at the National
Library of Medicine (thanks, Jim!).   


University of Houston Gene-Server retrieval info:

The files are available for e-mail retrieval in the Unix directory: the
command

send unix seqf-shar.aa seqf-shar.ab seqf-shar.ac seqf-shar.ad seqf-shar.ae
     seqf-shar.af seqf-shar.ag seqf-shar.ah seqf-shar.ai seqf-shar.aj
     seqf-shar.ak

will send all the files to you.  Remove mail headers, concatenate them all
together, and run "unshar" or just "/bin/sh filename" where "filename" is
the name of the concatenated file.  Then read "seqf.relnotes" for more info.

The shar file is available for anonymous FTP in menudo.uh.edu (129.7.1.6):
~ftp/pub/genbank-server/unix/seqf.shar and as split files 
~ftp/pub/genbank-server/unix/seqf-shar.a[a-k].


If you have questions, comments, flames, or even kind words about the
code, direct them all to:
 
Dr. Dan Davison
BCHS-5500
Dept. of Biochemical and Biophysical Sciences
University of Houston
4800 Calhoun          
Houston, TX 77204-5500

phone: 713-749-2801
fax:   713-749-3239

e-mail: davison at uh.edu (Internet)
        DAVISON at UHOU (BITNET)
        davison at uhnix1.UUCP (Usenet, new style)
        uhnix1!davison (Usenet, old style)      
        74065,41 Compu$erve (rarely!)





More information about the Bio-soft mailing list