Parallel Processors/Algorithms (Summary)

S S Sturrock sss at castle.ed.ac.uk
Fri May 7 07:56:29 EST 1993


In article <1sar9sINNlml at s-crim1.dl.ac.uk> mbpcr at s-crim1.dl.ac.uk (A. Parsons) writes:

Sheesh Tony!  You have been listening! :-)
>
>begin ;--------------------------------------------------------------------------------
>
>From: S S Sturrock <sss at uk.ac.ed.castle>
>Early Correspondence from Shane Sturrock (author of Mpsrch)

Very early so I figure I better correct a few points here and there.
>
>Well, MPsrch will arrive on a MasPar at Harrow (HGMP-RC) probably in the
>next week or so, they will also be evaluating BLAZE.  The EMBL installation

The Harrow installation is about to be removed, for various reasons neither
Blaze or MPsrch seemed to get much use.

>is waiting on good network connections and an upgrade of the OS on their
>machine since that MasPar was supplied by DEC.

Well, as most of you should know the installation has been up and running
since Feb and in that time nearly 3000 searches have been run, not bad.
Comments about only swiss-prot being offered have been passed on and I have
installed the latest code for both proteins *AND* nucleic acid searches but
there needs to be some work done at the Heidelberg end now to offer this
new code but then the EMBL na database will be available.

>Performance wise, MPsrch is about two and a bit times quicker, performance
>of BLAZE is about 55 million cell updates a second compared with 130

The gap has widened as expected.  MPsrch performs 202 million cell updates
per second now (Version 1.5) and MPsrch_na performs just under 400 million
per sec, note that these performances are for 4K PEs.  They scale linearly
with processor numbers.

>million on MPsrch.  More to the point though, although BLAZE is a Smith
>Waterman on the MasPar the reconstruction is by FASTDB, a word based method

I think in this case my info was not quite right, I was informed that the
alignments were full Smith/Waterman alignments later on, it still struck me
as being strange that the alignment scores didn't always agree with the search,
this I am assured has been fixed.

>which is supposedly fast but does not guarantee to get the same alignment
>as the MasPar, certainly the scores would not agree, the score is the one
>reported by the S/W algorithm, the alignment is FASTDB.  On MPsrch, the
>reconstruction is also by full S/W and still manages to be about 10x

Reconstruction is faster now, and I have increased the maximum size of
alignments to 65000 bp for nucleic acids.

>What you may be interested in is that the smallest MasPar machine is now
>only 55K pounds including the workstation host, this gives you a 1024
>processor MasPar box with MP1 processors, upgradable to 4096 processors,
>either MP1 or MP2 (which would give over 26000 mips + 6 gigaflops) without

6 gigs and is achieved with 16K PE MP2 configuration, the 4K PE set up will
get 1.5 gigs, still not bad for a small box.

>to go for anything else, and on the minimum configuration you will still
>get 40 million cell updates a second at the moment, but should be much more

For the smallest machine you now get 50 million for protein and 100 million
for NA.

>after I come back from California, basically you will get better

Had a good time, made some definite improvements particularly losing some
of the cost of database loading and other neat tricks.

>performance than BLAZE on a far cheaper machine.  Too good to be true?

True.

>ko61fr at genius.embnet.dkfz-heidelberg.de
>Newsgroups: bionet.software
>Organization: DKFZ Heidelberg
>Cc: 
>Status: R
>
>I have been using BLAZE as long as it was freely available, and I felt the
>big disadvantage with this service was that you could not get an alignment
>of your query sequence to the hits. BLITZ (which is Mpsrch) includes alignments
>(in fact you can choose what you want, hits and/or alignments) and is
>therfore my preferred choice. The only problem I have with the BLITZ alignments
>is that sometimes single residues are aligned (i.e. a single residue with gaps
>on both sides); structurally this seems nonsense to me, but this is a minor
>problem

I am assuming it is a problem of this sort:

          ** . *.   .* * *..   * . *.**.. .. * . .  . .*  * .  * *.* *
Db     64 RFVASDRLNDDAKAKFLNKLFYATVDITDPTQFGKLAD-LCGPVEK-GI-A--I-YLSTA 117
Qy     64 RF-TDDQ--AQAEA-FIEHFSYRAHDVTDAASYAVLKEAIEEAADKFDIDGNRIFYMSVA 119

Here the Alanine and Isoleucine are floating with gaps on either side, this
is an unfortunate effect of the algorithm and the scoring, I am looking
into correcting this at the moment.

The aligner for proteins and nucleic acids will prevent obvious problems of
the kind I described earlier where compound indels (gaps) are kept to a
minimum but this requires that the score remains the same (ie best that the
algorithm allows) and in the above case, if you want to join the GIAI
sequence to get one gap the score gets lower so I don't do it for now.
Modified penalties (such as affine gaps) will possibly be added as long as
it can be done without too much performance loss, where there's a will
etc.

>I did not attempt to compare the BLAZE and BLITZ results systematically,
>because of the missing alignments.

This is a problem (with Blaze, supposed to be fixed).
>
>I'm also interested in this area.  You may have more luck posting to
>the comp.parallel newsgroup.  I do know that several people are
>working in this area.  I recall something on the Thinking Machines
>CM-5 (and possible an older version on the CM-2).  There are also

The best performance I have heard of for the CM-5 is 30 million on a 32
node CM-5, you would thus need 400+ nodes to compete with a 4K PE MasPar.

Work I did on the CM-200 showed that it was not really suitable for high
mip requirement tasks like sequence analysis, never really managed to get
over 20 million on 16K PEs.  Mind you, I never ventured out of using C* on
that machine, with the MasPar assembly is so easy that you can make some
really significant gains of compiled C code. (Factor of 3 for MPsrch_na)

>------------------------------------------------------------------------------
>From: Michael McKenna <MCKMICP at Edu.Yale.YCC.YaleVM> writes

>identity, but MPsrch identified related molecules
>where FASTA, BLAST, Blaze, and BLOCKS failed.

Nice :-)



-- 
 \.    That is biological Captain!    | Shane Sturrock, BRU, Darwin Building,
(}:-(       -- Mr Sturrock            | University of Edinburgh, Scotland,
 /'                                   | Untied Kingdom (Split now!) :-)




More information about the Bio-soft mailing list