HASSLE v5
Nicole Redaschi
redaschi at comp.bioz.unibas.ch
Fri Oct 27 19:34:25 EST 1995
/BioComputing der Universitaet Basel
/ |
/ |
/ |
/ |
/ | H i e r a r c h i c a l
/ |
/ | A c c e s s
/ |
/ | S y s t e m f o r
/ |
/ | S e q u e n c e
/ |
/ | L i b r a r i e s i n
/ |
/_______________| E u r o p e
HASSLE has been completely redesigned to feature a structured application
programmer's interface (API) and to comply with the scientist's preferences
for generic graphical user interfaces on the Mac and PC. The layered design
has been implemented in ANSI C, using the NCBI toolbox and the IBM class
library user interface (ICLUI) for the graphical user interface (GUI) on PC,
Mac, and Motif/DecWindows platforms. 'Software inspection' was applied for
quality control: The written code was independently reread and compared
with the design specification,paired with extensive documentation.HASSLE v5
runs on most UNIX flavours, various flavours of IP implementations on VMS,
Windows 3.x, MacOS, and OS/2. Major authors of HASSLE v5 are F.Eggenberger,
R.Doelz and N.Redaschi.
Services run via HASSLE are announced in a subsequent posting.
HASSLE source code is available at the ftp archives:
ftp://ftp.switch.ch/mirror/embnet-ch/bioftp-sw/hassle.tar.Z
ftp://ftp.switch.ch/mirror/embnet-ch/bioftp-sw/HASSLE.BCK
ftp://bioftp.unibas.ch/bioftp-sw/hassle.tar.Z
ftp://bioftp.unibas.ch/bioftp-sw/HASSLE.BCK
Extensive documentation is available in HTML format. The FAQ (appended to
this announcement as text version), installation-, user- and provider guide
are available electronically via
http://www.ch.embnet.org/hassle.html
or as printed version (ISBN 3-905 434-01-6) (Inquire for details).
Acknowledgements
----------------
BioComputing Basel is supported by Basel University and grants from the Swiss
National Science Foundation and the Bundesamt fuer Bildung und Wissenschaften
within the frame of the EU programs BRIDGE and BIOMED.
****************************************************************************
HASSLE FAQ
****************************************************************************
What is HASSLE
==============
HASSLE is the name of a network protocol software which allows
applications to be run remotely. Its current application is in the
biological area. The current version number is 5, and was written by
F.Eggenberger, R.Doelz and N.Redaschi.
What is the nomenclature
========================
As HASSLE is a mutiple client/server application, we decided to call the
end-user site a customer, and the site offering services a provider.
Basic questions
***************
Who benefits from HASSLE
========================
HASSLE is a protocol-driven application which allows computers on
heterogeneous networks to talk to each other. A so-called 'Customer' site
requests a service from a 'Provider' and executes a given job at the
remote site after having transmitted all data required, and retransmits
the results from the remote site afterwards. Any user who has access to
applications which compose suitable HASSLE input will benefit from the
result of the service.
What are applications for HASSLE
================================
As HASSLE is a protocol application, suitable input must be provided by
other programs. The current area of application is focussed on Biology. In
order to either integrate HASSLE into an existing program environment
(e.g., the GCG environment (see below)), or to prepare input data from
scratch, dedicated applications are provided which are either programs or
scripts and similar procedures.
What Applications are provided
==============================
Current programs which cover biological application include FASTALERT and
Smith and Waterman sequence database searching, as well as BLAST,
GENEFINDER, EXPLORA and more in the GCG environment (a program package
delivered by Genetics Computer Group, Inc., Madison, Wisconsin), whereas
the first two can live as standalone application or script.
What applications are accessible with HASSLE
============================================
HASSLE is only a communication protocol, and launches applications
preinstalled at the HASSLE provider site. In Biology, this may include
software GCG software (see above), the NCBI suite, or other software. The
data associated to these applications are accessed with HASSLE, i.e. the
application at the customer site can access only the data which are
available to the application at the provider site.
What does the name come from
============================
The name HASSLE replaced the name SMBQS (Swiss Molecular Biology Query
System), and is derived from the term 'Hierarchical Access System for
Sequence Libraries in Europe'. It should be emphasized, however, that
HASSLE can do much more than serving biology customers.
What platforms are supported
============================
HASSLE and its associated tools run on various IP implementations provided
for the VMS operating system on either VAX or AXP operating system. All
major UNIX flavors are supported as well, including both generic and GNU
type of compilers. Small computer support is provided for Windows, Mac and
OS/2.
What is the operating principle of HASSLE
=========================================
HASSLE is a client/server application which allows to run asynchronous
jobs on the provider side. A HASSLE customer starts a client which
authenticates the network user, identifies the service and its
availability, and transmits all data. Upon an execution command sent to
the remote server the client receives a communication channel number
(i.e., a socket number) which is used to open a server at the client side.
This process will receive the answer from a remote HASSLE client which is
started at the end of the computation initiated at the provider site.
What is the authentication method in HASSLE
===========================================
Upon the initial start of the communication, the HASSLE customer sends a
request to a HASSLE provider which includes two strings (which are 'GUEST'
and 'DEMO' if not explicitly specified). The HASSLE provider will also
know the name of the computer where the request comes from. If this triade
of keywords is not found in a accounting file stored at the provider, the
customer is entered into this table, and a 'credit' of a preset amount of
'credit units 'is assigned to the customer name. Upon specifying a
service, the HASSLE provider subtracts the 'cost' of this service from the
'credits'. If, upon initial contact, the HASSLE provider realizes that no
more credits are available it will reject the requestor.
How does HASSLE fault tolerance work
====================================
If HASSLE rejects a request, because of no credits, overload, no service,
or operational reasons, the rejecting provider will tell about other
HASSLE providers which are believed to allow for similar requests. HASSLE
customers keep internal tables of these possible provider sources in order
to access those on demand.
What is the documentation material available
============================================
This FAQ is available on anonymous ftp at
nic.switch.ch:/mirror/embnet-ch/bioftp-sw/hassle and at
bioftp.unibas.ch:/programs/bioftp-sw/hassle, as well as on
http://www.embnet.unibas.ch/hassle.html. There is a paper in CABIOS
10(1), 1994 by R.Doelz, HASSLE: A program to access biological sequence
databases remotely.
The full HASSLE documentation (approx. 120 pages) is available as printed
document or as WWW page (http://www.ch.embnet.org/hassle.html). The API
documentation is formatted from the source code. Fully documented source
code is available for a license (see below).
Where is HASSLE available (archive sites)
=========================================
The HASSE distribution, including documentation, is available on anonymous
ftp at nic.switch.ch:/mirror/embnet-ch/bioftp-sw/hassle and at
bioftp.unibas.ch:/programs/bioftp-sw/hassle, in either compressed tar or
VMS BACKUP file format. A directory tree containing individual files is
available as well.
Starting to run HASSLE
**********************
What prerequisites are there to run HASSLE
==========================================
If you plan to run HASSLE regularly it is a good idea to check with your
system manager, but you do not need system privileges to install a HASSLE
customer.
You need to have access to the international Internet, and your host must
be registered in the Internet Domain Name System. You should run HASSLE on
one of the supported platforms (see above).
Less than 2 MByte are sufficient if you use precompiled binaries, the
source tree and documentation are less than 5 MByte (you should have about
10 MByte accessible, though). If you want to compile HASSLE, you need a C
compiler. The setup of HASSLE is described in the documentation.
Standalone tools have assiciated specific documentation. If you are going
to use the current release of GCG-compatible tools you need both the GCG
software as well as a fortran and C compiler licensed on the platform you
use.
Can I do keyword retrieval with HASSLE
======================================
There is the XSRS1 service which allows to retrieve entries given as entry
code of a database, or as list of filenames. There is, as of this writing,
the option to obtain HASSLE via the SRS package which pairs as a 'hgetz'
program to access a SRS server remotely via HASSLE.
Does HASSLE cost money
======================
The HASSLE software package is copyrighted code but available freely
without cost. If you use HASSLE, you use other people's resources. This
includes both bandwidth of the wire you use for communication as well as
compute and disk resources at the HASSLE provider site. You are
symbolically charged 'credit units' for the services you use but this must
not be associated with real money consumption. Therefore, if you use
HASSLE in large scale, you might reach limits of your 'credits' and be
excluded from certain HASSLE Provider sites. You might consider to
negotiate with the corresponding HASSLE provider individually.
Is there a 'trial run' option
=============================
HASSLE is available for selected platforms as binary image. The
script-based tools are easily configured. If you do not suffer from shared
library version mismatches it is possible to run HASSLE as a trial without
much configuration or installation problems.
Do I need additional software to run HASSLE
===========================================
Standalone tools (FastAlert, MPsrch) come with HASSLE built-in and are
ready to go.
To compile HASSLE code, a C compiler is required. If you want to run the
GCG-based environment tools then you need a GCG license, and a suitable
FORTRAN and C compiler, eventually a software developer's kit for the
motif interface. The HASSLE provider suite requires that all applications
and data are installed on the host where the provider suite is run.
Do I need a compiler to run HASSLE
==================================
To compile HASSLE code, a C compiler is required. If you want to run the
GCG-based environment tools then you need a GCG license, and a suitable
FORTRAN compiler additionally.
Do I need to apply for an account to get HASSLE running
=======================================================
HASSLE may use individual USER/ACCOUNT combinations (which are not
identical to username/password but rather handled internally). Without
special customer configuration, or if specially configured but hitting a
provider without this specification, HASSLE uses the combination GUEST and
DEMO and assigns a default 'credit' amount. This default 'credit' usually
allows normal work on small scale. Therefore, if tried HASSLE and intend
to use HASSLE in large scale, you might reach limits of your 'credits' and
be excluded from certain HASSLE Provider sites. You might consider to
negotiate with the corresponding HASSLE provider individually.
Does HASSLE run on a PC/Mac
===========================
At the point of this writing, HASSLE does run on PC with either Windows or
OS/2 system, or a Mac. The FastAlert and MPsrch applications are available
as binaries. GCG tools are not supported on PC/Mac as the required
libraries are not supported on these platforms.
Running HASSLE as a customer
****************************
Can I get full databases/updates on HASSLE
==========================================
The HASSLE system has been designed to serve as a remote submission
system. To ensure confidentiality, data are six-fold encrypted. This
implies that if you transfer a 50 Mbyte file this will possibly take half
an hour on a reasonable link, due to encryption and compression which
happens on-the-fly. HASSLE default services, therefore, do not include the
transmission of sequence updates. We do, however, run updating via HASSLE
using the LUP (List Update Processing) tools (details on request).
Can I search thousands of sequences with HASSLE
===============================================
HASSLE is a load-balanced system which usually employs a 'load' measure
which averages the system load over several minutes. As you start huge
jobs (meaning many jobs in short time), the latency of the load
determination might delay you much more than the benefits of actual HASSLE
search engines available. Additionally, you will be placed in a very
low-priority queue and the work will be very much delayed. Therefore, if
you must search that many times, you are urged to utilize a suitable
procedure which leaves a couple of minutes in between the searches, or
your requests will be very much delayed.
Quality control: What databases are accessible via HASSLE
=========================================================
HASSLE is a distributed system, therefore, more than one database version
might be employed if you get a single service executed. All HASSLE
providers try the best to be as p-to-date as possible but there might be
temporary delays in updating due to resource shortage. E.g., at Biozentrum
Basel we compile a 'non-redundant' database on both DNA and protein level
from all available sequence data world-wide. As this requires resources,
we do this compilation only at the weekend for BLAST, but every night on
FASTA services.This implies that if you do a search at monday you'll get a
BLAST database of saturday but a FASTA database of monday.
Why did my recent job never come back
=====================================
HASSLE applications try to be as transparent as possible, i.e. the end
user should never see that HASSLE was doing the job rather than a local
computer. In order to achieve remote access, usually a log file is
produced which notes all decision processes to guide you to a specific
HASSLE service. This means that if hassle was doing the job, you will have
at least two files in case of correct operation: one result and one log
file. If there was only one HASSLE provider in the system which could do
this service, and there was no reply from the HASSLE server, your client
can be as fault-tolerant as it can be but still fail. Then, you won't see
a result file but only the log file tells you that there was no
appropriate service available.
Why is the tool I need not there
================================
HASSLE is developed at the Biocomputing Laboratory at Basel University.
this implies that we usually produce only those tools which we can utilize
ourselves. If you ave a demand for a tool, and it is not there, or if you
want to provide a service, and there is no server template for this,
please understand that we cannot write the server script or tool for you.
However, we are willing to help you at the best of availability of our
resources to get you writing any tool or server you need by assisting you
in case of questions.
How can I debug HASSLE customers
================================
You might have seen that HASSLE is started by the programs with command
line flags, In order to see more of those flags, you might to use -verbose
(or /verbose on VMS) to see a more verbose output of HASSLE. The flag
-debug (/debug) gets you the commands executed on the protocol level in
detail, and -edebug (/edebug) gets you even the encryption process
revealed. There are more reports with the latter flag, like local file
access.
Planning to offer a HASSLE provider
***********************************
What precautions are required to offer HASSLE as a server
=========================================================
Networked software is intrinsically unsafe. HASSLE, therefore, is probably
unsafe software. HASSLE utilizes the BSD socket model to perform
communications. This implies that HASSLE runs as non-privileged process
launched by the internet daemon, which, in turn, runs as 'root' or
'SYSTEM'. There is a special section on 'security' in the HASSLE manual
associated to the distribution which explains how precautions are
implemented to have HASSLE execute as reasonable safe environment running
jobs as non-privileged user.
If you cannot accept the flaws introduced by this strategy, you should
better not install HASSLE (and delete mail, anonymous ftp, X Windows, etc.
as well).
What additional software is required to run a HASSLE provider
=============================================================
As HASSLE covers only the communication layer (in contrast to both
application and network layer) it is required that each of the software
packages you want to offer as 'service', and its associated data sets, are
installed separately. Hints are pointed out in the documentation which
versions are tested to collaborate with HASSLE. E.g. to offer a BLAST
software, you need both the BLAST suite from the NCBI as well as the
corresponding data sets compiled from the raw data.
What is required to build a HASSLE provider
===========================================
HASSLE will (should) compile flawlessly. There need to be some
adjustements made beforehand in some of the include files. Afterwards,
some additional configuration data are to be adapted to have the
corresponding service enabled. There is no need to recompile HASSLE
repeatedly if you insert a new service as the non-protocol, as well as all
higher level communication options are included in configuration rather
than source files.
How secure is HASSLE
====================
HASSLE launches a job as the user which is specified at installation time.
The only implication is that data might become large and reasonable
scratch space is required for some services. Therefore, it is recommended
to run HASSLE on disks which do not affect system performance.
How can I avoid abuse as HASSLE Provider
========================================
License issues and other constraints might require to have HASSLE enforce
a specific accounting. Suitable entries in the key and accounting files,
respectively, will allow to get only those colleagues work on your
computer which are explicitly allowed. The counterpart of no restrictions
is the typically applied internet model is possible but not encouraged.
Why does HASSLE not run on my system
====================================
As detailed above, HASSLE runs on a large number of operating systems. If
you run one of these and HASSLE does still not perform as expected, there
might be problems with (a) the customer (b) the provider and (c) the
shell/command file environment. It is usually expected that the operating
systems in use today are performing similar. We have, however, observed
incompatibilities in run-time versions of shared libraries, and
incompatibilities of specifically patched versions of some operating
systems. We are prepared to assist, at the best of our available
resources, in ways to make HASSLE executables run which we compile on one
of our systems.
Running HASSLE as a provider
****************************
How can I debug new scripts on the Provider side
================================================
The HASSLE documentation outlines the script generation procedure in a
tutorial section. To make HASSLE run only the script, you might want to
set a debug flag as documented in the GENERATOR script of the provider etc
directory. You might then analyze posthum which files were created and how
these files were affecting HASSLEs performance. If this doesn't help, you
might want to run hassle5.csh and hassle5.com, respectively, if your
script has the original file still available. Enabling interactivity ('set
echo' and 'set verify', respectively, give additional aid.
How can I clean temporary left-over files
=========================================
The documentation section explains crontab files (on the UNIX operating
system type) which can be used to clean up directory trees. Another
alternative is to design and run a specific HASSLE service which performs
this job. Such details are explained in the full documentation.
How do I optimize the load measuring routine in HASSLE
======================================================
HASSLE does require to reassure availability by retransmitting a 'load'
parameter. The default settings run the UNIX utility 'uptime' or measure
the amount of jobs in the VMS queuing systems, respectively. Both use a
multiplication to calculate a number between 0 and 100 which is
retransmitted by HASSLE. You might want to tune this multiplication to
generate a BUSY rejection at different load levels.
How can I monitor HASSLE usage
==============================
UNIX systems write each HASSLE launch triggered by the internet daemon in
the system log files. Both VMS and UNIX versions log the accesses in a
specific log file, including the result (busy, error or successful
completion).
How can I allow more/less anonymous usage
=========================================
You can (a) observe the anonymous usage in the file account and
account.dat, respectively, or change the value of the parameter default
credit in the corresponding include file and recompile the software. More
conveniently, you can ise different key files and direct users to service
tables with different credit costs as derived by origin rather by service
(e.g., make your national hosts charged less for a service than the
international or vice versa). Alternatively, you can set up a prototype
account file and overwrite the live account file at periodic intervals
after saving the original file for accounting or statistical purposes.
More sophisticated issues
*************************
HASSLE didn't compile well
==========================
Unfortunately, not each release of HASSLE can be tested on all operating
system platforms with each version of the compiler before release. It
might, therefore, be required to change few (but obvious) details which
are 'less refined' issues. Contact hassle-reg at comp.bioz.unibas.ch for
details if you run in trouble compiling HASSLE.
How can I customize HASSLE
==========================
To make HASSLE run as you want it you might consider to change the source
code. Each source code file has a header indicating the level of the
communication and the purpose of the routine. As of version 4.2 and
higher, all routines of each module are extensively documented. As of
Versions greater than release 5, each of the routines in HASSLE is
documented. To check whether a given change made HASSLE react correctly,
you might want to review the file definitions.h and inspect possible
values which can be used as 'z' value in the software verification.
HASSLE is distributed over the internet including full source code. As we
are concerned on copyright but want to allow recompilation to all, we
distribute the source code in a skimmed version without source code
comments. The fully documented source code is available if you sign a
license agreement with us.
Why is HASSLE written in the C programming language
===================================================
Due to portability constraints, it has been required to write HASSLE in a
well supported, easily adaptable language. Since version 1.0 of HASSLE all
parts of the software use C library routines. We consider to evaluate
HASSLE in a more object-oriented version as of 6.x (no dates committed).
Can I run HASSLE on a non-connected network
===========================================
Non-connected networks require to have still a valid name service. This
can be achieved by either a self-rooted name server system or extensive
use of the 'hosts' configuration file. If your network has basically
access to the net but you want still to run HASSLE individually, the need
is there to change two parameters which allow HASSLE's interoperability:
The first is to change the default socket number (in server.h) to an
unoccupied value (usually, numbers larger than 10000 are very safe, and
numbers less than 1024 require system privileges to run HASSLE, If you are
non-connected and would like to avoid HASSLE to escape,and are concerned
on security, use a port larger than 1024. HASSLE might then run on
non-privileged user ports from the beginning (exception: VMS under the
Multinet implementation of IP). The second parameter to change is the
default key (as well in server.h). The compile time definition switch
PARANOID is currently unsupported and has not the desired effect.
Can I use HASSLE functionality without running HASSLE
=====================================================
The HASSLE API is already present since version 4.0 in fragmentary form.
HASSLE 5 is fully documented, therefore, has a fully documented API
available for use.
--
-------------------------------------------------------------------
Nicole Redaschi - BioComputing - Biozentrum - University of Basel
Klingelbergstr. 70 - 4056 Basel - Switzerland
Tel +41 61 267 22 47 - Fax +41 61 267 20 78
e-mail redaschi at biox.embnet.unibas.ch
-------------------------------------------------------------------
More information about the Bionews
mailing list