Fasta, profile searches using BIOACCELERATO

Laurent Duret duret at evoserv.univ-lyon1.fr
Fri Mar 18 02:09:38 EST 1994


> Anyone know anything about BIOACCELERATOR from Compugen (?)

I just know that Eli Mintz from Compugen Ltd. (Israel) is involved in this
work. He gave a lecture/demonstration in Grenoble (France) in November 
(see lecture summary bellow).

> Does anyone know about this product or in fact have one?

You may ask to Jean-Jacques Codani email Jean-Jacques.Codani at inria.fr

Amicalement,

Laurent Duret
Laboratoire de Biometrie, Genetique et Biologie des Populations
URA CNRS 243 Universite Claude Bernard - Lyon I
43, Bd du 11 Novembre 1918 F-69622 Villeurbanne cedex


///////////////////////////////////////////////////////////////////////////

--------------------------------------------------------------------------

Title: The BIOCCELERATOR - A Sequence Analysis Accelerator Compatible 
                       With The GCG Program Suite

Abstract:
The Bioccelerator is a supercomputer that accelerates DNA and protein 
sequence analysis functions, such as Smith-Waterman database searches 
and Gribskov et al. Profile Searches, 100 to 1000 times relative to high 
end workstations. The BIOCCELERATOR is fully compatible with the Genetic 
Computer Group's (GCG) suite of programs and is transparent to the user 
except for speed enhancements. Knowledge of how to use the GCG program 
suite is all that is required in order to use the Bioccelerator 
effectively.

In the presentation, the BIOCCELERATOR's concept, architecture and usage 
will be explained. Following the presentation, a few examples will be 
run on the BIOCCELERATOR to demonstrate its abilities.

---------------------------------------------------------------------------

                    T H E      B I O C C E L E R A T O R
                    =====================================

                  Key Features of the BIOCCELERATOR
                  ---------------------------------

- Provides 2-3 orders of magnitude acceleration relative to high end
workstations for Smith and Waterman database searches, reaching speeds 
up to 320 million matrix cells per second. 

- The only available solution for fast Profile Searches, enabling searches
at speeds up to 320 million matrix cells per second.

- Fully compatible with GCG's program suite and database formats.

- Accelerates Fasta up to 10 times relative to high end workstations.

- Supports the Fasta, Profilesearch and Pileup functions from the GCG
program suite.

- User controlled through a simple Application Programming Interface
(API).

- Accessible from various platforms through a SCSI-2 compatible interface.

- Seamless network integration enables sharing the BIOCCELERATOR among
many users.

- Simple installation and maintenance.

- Three years warranty on hardware.

- Easy upgrade path to second generation BIOCCELERATORs.



                        General Description
                        -------------------

The BIOCCELERATOR is an hybrid between application specific hardware and 
a general purpose computer. It offers the speed advantages of 
application specific hardware while retaining the flexibility of 
programmable machines (at the cost of programming complexity). The 
BIOCCELERATOR is not a massively parallel machine. Rather, it utilizes 
up to 16 custom designed ultra fast processors. Since the processors' 
"personality" is re-programmable, the BIOCCELERATOR can accommodate many 
algorithms. 

The BIOCCELERATOR's architecture is modular and accommodates up to four 
modules, each containing up to 4 processors. The maximum configuration 
is therefore 16 processors on 4 modules and the minimum configuration is 
2 processors on 1 module. 

BIOCCELERATOR Configurations

Modules         Processors         Million Matrix 
                                  Cells per Second
--------------------------------------------------
1                   2                    40
1                   4                    80
2                   8                   160
3                  12                   240
4                  16                   320




                   GCG Compatibility
                   -----------------

Using the BIOCCELERATOR is extremely easy. Since it is fully integrated 
with the GCG program suite, calling the familiar GCG functions is all 
that is required. If the software detects the presence of the 
BIOCCELERATOR, it is used to accelerate the calculations. If the 
BIOCCELERATOR is not detected, the software only version of the program 
is run. The algorithms used by the hardware and software only versions 
of the product are identical. The output generated by the two versions 
for the same input is exactly the same.
When installing the BIOCCELERATOR there is no need to alter the GCG 
databases. All GCG database formats are supported. The hardware version 
of the programs will be constantly updated to reflect the updates in the 
GCG package. Hardware support for additional functions will also be 
added over time.

                 Application Programming Interface
                 ---------------------------------

For researchers who write their own programs or do not use the GCG 
package, Compugen offers an Application Programming Interface that will 
enable them to control the BIOCCELERATOR from any application. Compugen 
will also support researchers who use modified versions of the 
established search algorithms and will customize the BIOCCELERATOR 
programs to their specific needs. 

                 Standard SCSI-2 Interface
                 -------------------------

In order to function properly, the BIOCCELERATOR must be connected to a 
host. The host is mainly responsible for the user interface and for 
supplying the BIOCCELERATOR with the necessary data. The host connects 
to the BIOCCELERATOR via a standard SCSI-2 interface. In essence, the 
BIOCCELERATOR is SCSI-2 compatible peripheral like a disk drive. 
Software drivers are currently available for Silicon Graphics and Sun 
workstations. 

                 Seamless Network Integration
                 ----------------------------

The BIOCCELERATOR is network ready. The server has to reside on the 
workstation that is physically connected to the hardware but can be 
accessed through Remote Calling Procedures (RPC) from any workstation on 
the network. Please note that transferring databases through the network 
may slow down searches. The recommended way to use the BIOCCELERATOR is 
to remotely login to the machine connected to it.

                Simple Installation and Maintenance
                -----------------------------------

Installing the BIOCCELERATOR is straight forward and involves connecting 
two cables, loading software and running a verification test. The 
BIOCCELERATOR checks itself using a series of automatic tests that are 
run periodically. If any problem is detected the user is notified and 
given precise actions to follow.

                 Three Years Hardware Warranty
                 -----------------------------

The BIOCCELERATOR is built according to the most stringent standards 
enabling us to offer our customers three years warranty on the hardware. 
In the unlikely case that the product fails, a new one will be shipped 
immediately. 



                     GCG FUNCTIONS SUPPORTED
                     -----------------------

As of January 1994, the BIOCCELERATOR will support a number of functions 
from the GCG program suite:

Smith and Waterman database search 
This new GCG function will only work on hosts with access to the 
BIOCCELERATOR. The function operates both on protein and DNA databases. 
The algorithm used is the Smith and Waterman algorithm as modified by 
Gotoh with full affine scoring. 

PROFILESEARCH - Profile searches of DNA or protein databases
Profile searching has been proven to be a very sensitive method for 
aligning distantly related sequences. Because the method is 
computationally intensive, running PROFILESEARCH on standard platforms 
has been prohibitive with runs taking many hours. Using the 8 processor 
version of the BIOCCELERATOR the typical run times are reduced to less 
than one minute.

FASTA, TFASTA - Pearson and Lipman similarity searches
These popular methods, though less sensitive than more rigorous search 
methods, are the workhorse of biologists for database searches because 
of the reasonable time frame in which they run. The different versions 
of the algorithm are accelerated by about a factor of ten.

PILEUP - Multiple sequence alignment
Pileup uses two way alignments and cluster analysis in order to align 
multiple sequences.

For detailed information about the functions please consult the Program 
Manual of the Genetics Computer Group's Sequence Analysis Software 
Package. 
According to customer requests, more functions will be supported in the 
future.


                    Typical Run Times
                    -----------------

A Smith and Waterman database search with a query 500 amino acids long 
on version 26.0 of Swiss-Prot takes approximately 40 seconds on a 
BIOCCELERATOR with 8 processors. A Profile Search with a profile of 
length 500 takes the same time. Run times scale down linearly as more 
processors are added and scale up linearly as the database or query 
grow.









More information about the Bio-soft mailing list