Fasta, profile searches using BIOACCELERATO
Laurent Duret
duret at evoserv.univ-lyon1.fr
Fri Mar 18 02:09:38 EST 1994
> Anyone know anything about BIOACCELERATOR from Compugen (?)
I just know that Eli Mintz from Compugen Ltd. (Israel) is involved in this
work. He gave a lecture/demonstration in Grenoble (France) in November
(see lecture summary bellow).
> Does anyone know about this product or in fact have one?
You may ask to Jean-Jacques Codani email Jean-Jacques.Codani at inria.fr
Amicalement,
Laurent Duret
Laboratoire de Biometrie, Genetique et Biologie des Populations
URA CNRS 243 Universite Claude Bernard - Lyon I
43, Bd du 11 Novembre 1918 F-69622 Villeurbanne cedex
///////////////////////////////////////////////////////////////////////////
--------------------------------------------------------------------------
Title: The BIOCCELERATOR - A Sequence Analysis Accelerator Compatible
With The GCG Program Suite
Abstract:
The Bioccelerator is a supercomputer that accelerates DNA and protein
sequence analysis functions, such as Smith-Waterman database searches
and Gribskov et al. Profile Searches, 100 to 1000 times relative to high
end workstations. The BIOCCELERATOR is fully compatible with the Genetic
Computer Group's (GCG) suite of programs and is transparent to the user
except for speed enhancements. Knowledge of how to use the GCG program
suite is all that is required in order to use the Bioccelerator
effectively.
In the presentation, the BIOCCELERATOR's concept, architecture and usage
will be explained. Following the presentation, a few examples will be
run on the BIOCCELERATOR to demonstrate its abilities.
---------------------------------------------------------------------------
T H E B I O C C E L E R A T O R
=====================================
Key Features of the BIOCCELERATOR
---------------------------------
- Provides 2-3 orders of magnitude acceleration relative to high end
workstations for Smith and Waterman database searches, reaching speeds
up to 320 million matrix cells per second.
- The only available solution for fast Profile Searches, enabling searches
at speeds up to 320 million matrix cells per second.
- Fully compatible with GCG's program suite and database formats.
- Accelerates Fasta up to 10 times relative to high end workstations.
- Supports the Fasta, Profilesearch and Pileup functions from the GCG
program suite.
- User controlled through a simple Application Programming Interface
(API).
- Accessible from various platforms through a SCSI-2 compatible interface.
- Seamless network integration enables sharing the BIOCCELERATOR among
many users.
- Simple installation and maintenance.
- Three years warranty on hardware.
- Easy upgrade path to second generation BIOCCELERATORs.
General Description
-------------------
The BIOCCELERATOR is an hybrid between application specific hardware and
a general purpose computer. It offers the speed advantages of
application specific hardware while retaining the flexibility of
programmable machines (at the cost of programming complexity). The
BIOCCELERATOR is not a massively parallel machine. Rather, it utilizes
up to 16 custom designed ultra fast processors. Since the processors'
"personality" is re-programmable, the BIOCCELERATOR can accommodate many
algorithms.
The BIOCCELERATOR's architecture is modular and accommodates up to four
modules, each containing up to 4 processors. The maximum configuration
is therefore 16 processors on 4 modules and the minimum configuration is
2 processors on 1 module.
BIOCCELERATOR Configurations
Modules Processors Million Matrix
Cells per Second
--------------------------------------------------
1 2 40
1 4 80
2 8 160
3 12 240
4 16 320
GCG Compatibility
-----------------
Using the BIOCCELERATOR is extremely easy. Since it is fully integrated
with the GCG program suite, calling the familiar GCG functions is all
that is required. If the software detects the presence of the
BIOCCELERATOR, it is used to accelerate the calculations. If the
BIOCCELERATOR is not detected, the software only version of the program
is run. The algorithms used by the hardware and software only versions
of the product are identical. The output generated by the two versions
for the same input is exactly the same.
When installing the BIOCCELERATOR there is no need to alter the GCG
databases. All GCG database formats are supported. The hardware version
of the programs will be constantly updated to reflect the updates in the
GCG package. Hardware support for additional functions will also be
added over time.
Application Programming Interface
---------------------------------
For researchers who write their own programs or do not use the GCG
package, Compugen offers an Application Programming Interface that will
enable them to control the BIOCCELERATOR from any application. Compugen
will also support researchers who use modified versions of the
established search algorithms and will customize the BIOCCELERATOR
programs to their specific needs.
Standard SCSI-2 Interface
-------------------------
In order to function properly, the BIOCCELERATOR must be connected to a
host. The host is mainly responsible for the user interface and for
supplying the BIOCCELERATOR with the necessary data. The host connects
to the BIOCCELERATOR via a standard SCSI-2 interface. In essence, the
BIOCCELERATOR is SCSI-2 compatible peripheral like a disk drive.
Software drivers are currently available for Silicon Graphics and Sun
workstations.
Seamless Network Integration
----------------------------
The BIOCCELERATOR is network ready. The server has to reside on the
workstation that is physically connected to the hardware but can be
accessed through Remote Calling Procedures (RPC) from any workstation on
the network. Please note that transferring databases through the network
may slow down searches. The recommended way to use the BIOCCELERATOR is
to remotely login to the machine connected to it.
Simple Installation and Maintenance
-----------------------------------
Installing the BIOCCELERATOR is straight forward and involves connecting
two cables, loading software and running a verification test. The
BIOCCELERATOR checks itself using a series of automatic tests that are
run periodically. If any problem is detected the user is notified and
given precise actions to follow.
Three Years Hardware Warranty
-----------------------------
The BIOCCELERATOR is built according to the most stringent standards
enabling us to offer our customers three years warranty on the hardware.
In the unlikely case that the product fails, a new one will be shipped
immediately.
GCG FUNCTIONS SUPPORTED
-----------------------
As of January 1994, the BIOCCELERATOR will support a number of functions
from the GCG program suite:
Smith and Waterman database search
This new GCG function will only work on hosts with access to the
BIOCCELERATOR. The function operates both on protein and DNA databases.
The algorithm used is the Smith and Waterman algorithm as modified by
Gotoh with full affine scoring.
PROFILESEARCH - Profile searches of DNA or protein databases
Profile searching has been proven to be a very sensitive method for
aligning distantly related sequences. Because the method is
computationally intensive, running PROFILESEARCH on standard platforms
has been prohibitive with runs taking many hours. Using the 8 processor
version of the BIOCCELERATOR the typical run times are reduced to less
than one minute.
FASTA, TFASTA - Pearson and Lipman similarity searches
These popular methods, though less sensitive than more rigorous search
methods, are the workhorse of biologists for database searches because
of the reasonable time frame in which they run. The different versions
of the algorithm are accelerated by about a factor of ten.
PILEUP - Multiple sequence alignment
Pileup uses two way alignments and cluster analysis in order to align
multiple sequences.
For detailed information about the functions please consult the Program
Manual of the Genetics Computer Group's Sequence Analysis Software
Package.
According to customer requests, more functions will be supported in the
future.
Typical Run Times
-----------------
A Smith and Waterman database search with a query 500 amino acids long
on version 26.0 of Swiss-Prot takes approximately 40 seconds on a
BIOCCELERATOR with 8 processors. A Profile Search with a profile of
length 500 takes the same time. Run times scale down linearly as more
processors are added and scale up linearly as the database or query
grow.
More information about the Bio-soft
mailing list