IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Distributing BLAST jobs

lmj at pasteur.fr lmj at pasteur.fr
Thu Mar 7 04:51:07 EST 2002

In article <3C84F548.7050800 at purdue.edu>,
Rick Westerman  <westerman at purdue.edu> wrote:
>     This should be a common task so I suspect that someone has done it 
>before but I can not find a reference so any help is appreciated.
>      What I want to do is to distribute BLAST search requests to 
>multiple machines that are not hooked together in a unified way.  Basically:
>1) End user, via a web screen, says something like "I want to run blastx 
>through PIR on these 300 sequences."

I had to do this for 30000 protein sequences against the NCBI nr protein
database. And it was repeated every month..

>2) Program 'X' takes the sequences and distributes them to computers 
>'A', 'B', and 'C' all of which have the blast program installed and the 
>databases installed locally.  Said computers could be a Condor-cluster, 
>MP machines, or other. All Unix-based though.

The program 'X' in my place was 'ppmake' which uses 'PVM' (Parallel
Virtual Machine URL:<http://www.csm.ornl.gov/pvm/pvm_home.html>). All 
machines in the pvm shared the directory with the sequences. The makefile 
was setup with all the targets as the blast output files with dependancy on
the sequence file.

All you have to do is startup PVM on the clients machine and type ppmake.

>3) Program 'Y' picks up the results from the computers and gives them 
>back to the end user.

If the machines share the same directory, no need to transfer the files
back to the originating machine.

>     What I want to find are programs 'X' and 'Y'.
>     One could extend this idea to Fasta searches, PFAM searches, etc.   
> Undoubtedly there are several ways to implement this; I'm not too picky 
>on how it is done.  I'm sure one of the bigger sequencing insititutions 
>has something like this but I can not seem to find the 'X' and 'Y' program.

This would work quite easily for other programs that can be controled by 
a makefile.

Setting up PVM was the most timeconsuming but was easy to do. I installed PVM
on several DEC Alphas, Sun Solaris, and SGI (which means that I installed
blastall on these machines, too). I opted to
use static scheduling (instead of the dynamic scheduling of PVM). 

If you like, I have scripts for starting PVM on the clients, syncing the
blast database files, and running the ppmake that I can send you if you 
decide to go this route.

>Thanks in advance,
>-- Rick
>westerman at purdue.edu

Hope this helps,


Louis Jones                             /\
Institut Pasteur                o      /  \
28, rue du Dr. Roux            /<(*)/\/    \

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net