Databases of less than N% similar proteins (or portable Smith Waterman)

Eitan Rubin bcrubin at
Thu Aug 8 06:10:13 EST 1996

Are you familiar with the following? I don't have a personal experiance 
with it yet.

asset.note (gcguser) Wed Feb 14 11:49:12 1996
   /* ==================================================================
   *                            PUBLIC DOMAIN NOTICE
   *               National Center for Biotechnology Information
   *  This software/database is a "United States Government Work" under 
   *  terms of the United States Copyright Act.  It was written as part 
   *  the author's official duties as a United States Government 
employee and
   *  thus cannot be copyrighted.  This software is freely available to 
   *  public for use. The National Library of Medicine and the U.S. 
   *  have not placed any restriction on its use or reproduction.
   *  Although reasonable efforts have been taken to ensure the accuracy
   *  and reliability of the software and data, the NLM and the U.S.
   *  Government do not and cannot warrant the performance or results 
   *  may be obtained by using this software or data. The NLM and the 
   *  Government disclaim all warranties, express or implied, including
   *  warranties of performance, merchantability or fitness for any 
   *  purpose.
   *  Please cite
   *     A. F. Neuwald and P. Green (1994) "Detecting Patterns in 
   *     Sequences", J. Mol. Biol. 239:698-712.
   *  in any work or product based on this material.
   *       The data structures used in this program are part of a 
   *    of object oriented C code for molecular biological applications
   *    being developed by A. F. Neuwald.
   * ===================================================================

   ASSET (Aligned Segment Statistical Evaluation Tool) version 1.0

   Include 3 programs: asset, purge and scan.  Each of these
   programs require fasta formated input files.

   The PURGE program removes closely related sequences from an input
   file prior to running asset.  This is important in order to reduce
   input sequence redundancy.  The command syntax for purge is:

                purge <in_file> <score>

   where <score> determines the maximum blosum62 relatedness score
   between any two sequences in the output file (the output file is
   created with the name <in_file>.b<score>). A score between 100 and
   200 is recommended.  The scan program scans a database for sequences
   that contain motifs detected by asset.  A paper describing the
   details of the scan and purge programs is in preparation.

   The ASSET program will produce a "scan file" of the locally aligned
   segment blocks by using the -f<int> option; <int> specifies the
   percentage of sequences in the input file that are required to
   contain a motif before the corresponding motif block can be included
   in the scan file.  The scan file is given the name <in_file>.sn.

More information about the Bio-soft mailing list