Massively Parallel Applications in Sequence Analysis

Geir Egil Hauge geirha at
Thu Apr 1 04:04:07 EST 1993

>    I should note also, since some readers of this group may be
>interested, that I now have a version of our parallel "platform" for
>sequence comparison ( Despande, Richards, and Pearson (1991) CABIOS "A
>platform for biological sequence comparison on parallel computers"
>7:237-247) running on networks of workstations using PVM (parallel
>virtual machine), a freely available package for almost any machine.
>If you are doing lots of sequence comparisons, I can provide you with 
>PVM versions for FASTA and Smith-Waterman, with BLAST to be available 
>in about a month.

My shareware package dtask v1.1s (no previous versions available) for running 
UNIX workstations in parallel when comparing biological sequences, is to be 
released in about 1 month (as soon as it is cleared by my supervisors). 

A sequence comparison program using the Smith-Waterman-1981 algorithm is
included in the package.

I have tested the package on as much as 96 UNIX workstations in parallel.
The speed was then measured to be 42 million matrix cells updates per second,
using a 801 residue long protein query sequence against Swiss-Prot #21. (It 
took 151 seconds). 

The speedup was measured to be 32 against a Sun Sparc-10 station. This is 82% 
of "perfect speedup". The speedup will be better on heavier jobs (longer query 
sequences) and smaller on lighter jobs (short query sequences). The speedup 
will also be better when a smaller number than 96 machines are run in 

Among the 96 machines were machines like: SUN 3/50, SUN 3/60, Sparc-2,
Sparc-10, DEC3100, DEC5000/200, and some SGI and HP machines running System V 
derived UNIX systems. (In version 1.1s of dtask, BSD signals are needed on 
system V systems). The machines have to depart in a common file system like
NFS, and must be able to do UNIX socket(2)/AF_INET communication.

The programs are built in such a way that the programs may detect if a 
workstation is heavily used by other users, and then stop using that/those 
machines for a specified time before the machine(s) is/are tried again.

I use indexfiles in such a way that the programs are quite independent of
library format. Only the program that creates indexfiles has to be
altered. A program for making indexfiles from Pearson/FASTA-format
libraries are included.

The package, containing complete C-source, documentation and tests, will 
be available from anonymous ftp "" in about a month. 

Geir Egil Hauge

More information about the Bio-soft mailing list