Massively Parallel Applications in Sequence Analysis

Bill Pearson wrp at cyclops.micr.Virginia.EDU
Mon Mar 29 10:11:50 EST 1993


	Regarding FASTA/BLAST vs Smith-Waterman.

	I am in the process of writing up an extensive comparison of
FASTA, BLAST, and Smith-Waterman and various scoring matrices.  This
paper will be an extension of my earlier one: "Pearson, (1991)
Genomics "Searching Protein Sequence Libraries: Comparison of the
Sensitivity and Selectivity of the Smith-Waterman and FASTA
Algorithms" 11:635-650.

	I feel uncomfortable giving away the punch line, since the
paper has neither been written nor reviewed, but one of the
conclusions is that the results of the Genomics paper - that FASTA
with optimization performs as well as Smith-Waterman, will be
supported with considerably more data and better statistical analyses.

	I should note also, since some readers of this group may be
interested, that I now have a version of our parallel "platform" for
sequence comparison ( Despande, Richards, and Pearson (1991) CABIOS "A
platform for biological sequence comparison on parallel computers"
7:237-247) running on networks of workstations using PVM (parallel
virtual machine), a freely available package for almost any machine.
If you are doing lots of sequence comparisons, I can provide you with 
PVM versions for FASTA and Smith-Waterman, with BLAST to be available 
in about a month.

	Here are some typical timings on a network of 12 Sparc IPC's
using PVM2.4 (PVM3.0 is a little slower)

pvm2.4, 20 protein sequences vs 
annotated PIR34 (approx 10K sequences)

nodes	11      7       3   
	--------------------
k2	 78     105     207 	(times in seconds)
	 76     128     206
	(73.9)
                    
k1	310     466     1070
	312     471     1083
	(94.6)
                    
ok1	559     836     1995
	532     811     1898
	(97.3)

Smith-Waterman times are about 5X the ok1 times.  The values in parentheses
indicate the relative efficiency of 11 nodes compared to 3 nodes. Thus
k2 on 11 nodes is 11/3*.739 times faster; ok1 is 11/3*.973 times faster.

Bill Pearson




More information about the Bio-soft mailing list