Massively Parallel Applications in Sequence Analysis

Reinhard Doelz doelz at comp.bioz.unibas.ch
Fri Apr 2 02:59:45 EST 1993


In article <1peuciINNicd at s-crim1.dl.ac.uk>, mbpcr at s-crim1.dl.ac.uk (A. Parsons) writes:
|> The question I originally asked also had the caveat (which noone to date has
|> commented on) "How much longer can we do WITHOUT data parallel solutions for
|> searching the masses of data being generated by the HGMP?"
|> 

Sorry for playing devil's advocate but I would say that the individual 
site can live long without a parallel computer. The database providers 
might like to have one, which they can use effectively. The biocomputer
sites might need one (if they can afford to abandon the safe tracks and 
develop, maintain, and optimize own hardware/software solutions). 

You might have seen the discussion on the suspected contaminations reported
in human cDNA libraries (Science, issue of previous week). In order to run 
thousands of sequences vs. EMBl, SWISSPROT, GENBANK, PIR we used a Silicon 
Graphics Cluster, containing 2 Indigos, a Crimson, and a 2-Processor Power 
series. It took about two to three weeks elapsed time on this setup, where
the machines were also used for development work, molecular modelling, and 
general GCG usage. Now if you have such a problem of running 10000 fasta 
and blast jobs, how often do you need the results `yesterday` ? How often do 
you have these giantic questions at all? If you run these analysis, the 
'granularity' as given in the fact that you run each sequence after another 
is sufficient to dissipate it amongst machines. You might want to have a 
parallel version if you have a multi-processor machine but the real idea 
is that the effort to manage, and maintain, a pvm installation is tedious 
on the long run. A Hierarchical Access System is perfect enough - you 
distribute single jobs across the net and keep using the 'established' 
software. This is alos beneficious due to the fact that the experience 
with the parameters and the search results is generally higher with fasta
or blast-type of searches than with special algorithms. 

Last point coming into mind is that the question on 'similarity' is 
differently asked at database providers rather than end users. A reasona-
ble check of a database annotator is to look for identities, maybe figure 
out alleles, but not weak homologies as most of the end users are interes-
ted in. This is explicitly not talking on the 'biocomputers' who search 
the needle in the hey (no pun intended - just to scale the effort right). 
Those guys might not be able to live without monsters in the future. 


-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
               -----------------------------------------




More information about the Bio-soft mailing list