Regular homology searches

Roland Walker walker at ncbi.nlm.nih.gov
Fri Jul 10 14:57:32 EST 1998


Juergen Pleiss wrote:
> We want to regularily search sequence databases for proteins which are homologous to a given target.

This is easy to do with a shell script using the SEALS package

   http://www.ncbi.nlm.nih.gov/Walker/SEALS/index.html

Many variations are possible on the following script, which
(starting with a fasta library of your favorite sequences) uses
PSI-BLAST to find new homologs, which are mailed to you and added
to the library.  The latest additions are kept in a file with a 
name based on the input file, suffixed with '_new'.

What is even more interesting is that by using the 'tax_filt' command,
you can limit or sort your new additions by any taxonomic node.

If you also want software to keep your databases updated, this
functionality is about to be added to SEALS (two releases from now).

Email me with any questions.

R

#!/bin/sh
#
#  newhits
#
#  Notify me when new sequences are added to the
#  database that match into my list of favorites.
#
#  Suitable for use as a cron job.
#
#  Invoke like this
#
#     newhits favorites.fa
#

if test ! -s $1; then
  echo ' '
  echo " $0: input file $1 is empty or does not exist"
  echo ' '
  exit 1
fi

splishpgp nr $1 -proc= smart -psi= 3 | blast2gi -pcut= .001 | \
gi2fasta | fanot $1 > $1_new

if test -s $1_new; then
  cat $1_new >> $1
  mailme New members for $1 can be found in $1_new < $1_new
# else mailme No new members for $1 today      # uncomment if you like
fi

#



=========================================================
> 
> Dr.Juergen Pleiss
> Institute of Technical Biochemistry
> University of Stuttgart          Email:jpleiss at tebio1.biologie.uni-stuttgart.de
> Allmandring 31                   Phone:(+49)-711-685-3191
> D-70569 Stuttgart, Germany       Fax:  (+49)-711-685-3196
> W3 home page:  http://www.itb.uni-stuttgart.de:8080/




More information about the Bio-soft mailing list