Massively Parallel Applications in Sequence Analysis
Geir Egil Hauge
geirha at ifi.uio.no
Thu Apr 1 04:04:07 EST 1993
> I should note also, since some readers of this group may be
>interested, that I now have a version of our parallel "platform" for
>sequence comparison ( Despande, Richards, and Pearson (1991) CABIOS "A
>platform for biological sequence comparison on parallel computers"
>7:237-247) running on networks of workstations using PVM (parallel
>virtual machine), a freely available package for almost any machine.
>If you are doing lots of sequence comparisons, I can provide you with
>PVM versions for FASTA and Smith-Waterman, with BLAST to be available
>in about a month.
My shareware package dtask v1.1s (no previous versions available) for running
UNIX workstations in parallel when comparing biological sequences, is to be
released in about 1 month (as soon as it is cleared by my supervisors).
A sequence comparison program using the Smith-Waterman-1981 algorithm is
included in the package.
I have tested the package on as much as 96 UNIX workstations in parallel.
The speed was then measured to be 42 million matrix cells updates per second,
using a 801 residue long protein query sequence against Swiss-Prot #21. (It
took 151 seconds).
The speedup was measured to be 32 against a Sun Sparc-10 station. This is 82%
of "perfect speedup". The speedup will be better on heavier jobs (longer query
sequences) and smaller on lighter jobs (short query sequences). The speedup
will also be better when a smaller number than 96 machines are run in
Among the 96 machines were machines like: SUN 3/50, SUN 3/60, Sparc-2,
Sparc-10, DEC3100, DEC5000/200, and some SGI and HP machines running System V
derived UNIX systems. (In version 1.1s of dtask, BSD signals are needed on
system V systems). The machines have to depart in a common file system like
NFS, and must be able to do UNIX socket(2)/AF_INET communication.
The programs are built in such a way that the programs may detect if a
workstation is heavily used by other users, and then stop using that/those
machines for a specified time before the machine(s) is/are tried again.
I use indexfiles in such a way that the programs are quite independent of
library format. Only the program that creates indexfiles has to be
altered. A program for making indexfiles from Pearson/FASTA-format
libraries are included.
The package, containing complete C-source, documentation and tests, will
be available from anonymous ftp "ftp.ifi.uio.no" in about a month.
Geir Egil Hauge
More information about the Bio-soft