SMP vs. Beowulf?

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Thu Dec 2 12:47:48 EST 1999


Many of the key sequence analysis packages are now threaded.  That means
they can run on SMP machines and use multiple CPUs to speed things up.  The
other type of multicpu "machine" in common use these days is a distributed 
system, like a Beowulf cluster.  In terms of $/Spec distributed machines
tend to be a lot cheaper than the SMP equivalents, especially so when you
start looking at N >> 2.  For this reason, many "supercomputers" are now
distributed machines. 

The database search algorithms are naturals for distributed calculations.
Recent versions of Fasta come with both threaded and PVM variants.  I've
not seen a comparison of the performance though. Has anybody tried it with
an N node SMP machine vs. an N node distributed machine, with equivalent
CPUs on both? 

BLAST too seems like a good candidate for distributed computing. For
instance, imagine BLAST on "nr" on an N node distributed machine: 

 1.  format nr into N "equal" sized BLAST databases (for instance, by 
     assigning sequence j to a database via modulo(j,N)).
 2.  run each query sequence on N machines, each with one chunk of the
     common database preloaded into memory.
 3.  merge the results from the N machines.

Other than a requirement for putting in a correction for the true database
size it at least seems straightforward. However, while BLAST is available
threaded, I've not been able to find anything which appears to be for use
in distributed systems. 

Is there a distributed implementation of BLAST as well?

If not, is it because it's been tried and failed, or because nobody has
attempted it yet?

Thanks,

David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 




More information about the Bio-soft mailing list