Massively Parallel Applications in Sequence Analysis
Reinhard Doelz
doelz at comp.bioz.unibas.ch
Fri Apr 2 02:59:45 EST 1993
In article <1peuciINNicd at s-crim1.dl.ac.uk>, mbpcr at s-crim1.dl.ac.uk (A. Parsons) writes:
|> The question I originally asked also had the caveat (which noone to date has
|> commented on) "How much longer can we do WITHOUT data parallel solutions for
|> searching the masses of data being generated by the HGMP?"
|>
Sorry for playing devil's advocate but I would say that the individual
site can live long without a parallel computer. The database providers
might like to have one, which they can use effectively. The biocomputer
sites might need one (if they can afford to abandon the safe tracks and
develop, maintain, and optimize own hardware/software solutions).
You might have seen the discussion on the suspected contaminations reported
in human cDNA libraries (Science, issue of previous week). In order to run
thousands of sequences vs. EMBl, SWISSPROT, GENBANK, PIR we used a Silicon
Graphics Cluster, containing 2 Indigos, a Crimson, and a 2-Processor Power
series. It took about two to three weeks elapsed time on this setup, where
the machines were also used for development work, molecular modelling, and
general GCG usage. Now if you have such a problem of running 10000 fasta
and blast jobs, how often do you need the results `yesterday` ? How often do
you have these giantic questions at all? If you run these analysis, the
'granularity' as given in the fact that you run each sequence after another
is sufficient to dissipate it amongst machines. You might want to have a
parallel version if you have a multi-processor machine but the real idea
is that the effort to manage, and maintain, a pvm installation is tedious
on the long run. A Hierarchical Access System is perfect enough - you
distribute single jobs across the net and keep using the 'established'
software. This is alos beneficious due to the fact that the experience
with the parameters and the search results is generally higher with fasta
or blast-type of searches than with special algorithms.
Last point coming into mind is that the question on 'similarity' is
differently asked at database providers rather than end users. A reasona-
ble check of a database annotator is to look for identities, maybe figure
out alleles, but not weak homologies as most of the end users are interes-
ted in. This is explicitly not talking on the 'biocomputers' who search
the needle in the hey (no pun intended - just to scale the effort right).
Those guys might not be able to live without monsters in the future.
--
+----------------------------------+-------------------------------------+
| Dr. Reinhard Doelz | RFC doelz at urz.unibas.ch |
| Biocomputing | DECNET 20579::48130::doelz |
|Biozentrum der Universitaet | X25 022846211142036::doelz |
| Klingelbergstrasse 70 | FAX x41 61 261- 6760 or 267- 2078
| CH 4056 Basel | TEL x41 61 267- 2076 or 2247 |
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
-----------------------------------------
More information about the Bio-soft
mailing list