Future of Computational Molecular Biology

Duncan Rouch ROUCHDA at VAX1.COMPUTER-CENTRE.BIRMINGHAM.AC.UK
Wed Oct 16 13:21:00 EST 1991


The  following  document is   relevant to both   program writers   and
managers/user  support staff  of  multi-software collections  on nodes
which serve interactive users. Any comment is welcome.

The document  discusses  intended changes  necessary   to maximize the
efficiency of computer-aided  molecular  biology analysis performed on
the UK national SEQNET node at Daresbury but  would apply equally well
at other  sites.    These changes  include  major   alteration of both
user-program interfaces and user support documentation.

SEQNET at Daresbury UK runs a wide range of molecular biology software
on  a VAX 3600 linked  to  a DECSERVER  5100. It  supports  over  1300
academic and industrial subscribers.  SEQNET is  funded by the Science
and Engineering Research Council (SERC), UK.

Duncan Rouch
Frank Wright
Alan Bleasby

<-----///////////////////// CLIP ///////////////////////////////----->

Computational molecular biology from the user's viewpoint: basis for a
better interface between the biologist and the computer.

Contents:

0   Background
1   Introduction
2   Aim, Approach and Strategy
    2.1  Aim and Approach
    2.2  Strategy
3   The Biologist's Problem
4   Proposed Solutions
    4.1  Software Solutions
         4.1.1  A Global Interface Style
         4.1.2  Input/Output Standardisation
         4.1.3  Streamlining Interfaces
         4.1.4  Online Menu System
    4.2  Educational Solutions
         4.2.1  The Role of the User Guide
         4.2.2  Hard Copy and Online Documentation
5   Acknowledgements

------------------------------------------------------------------------
0   Background

Computers    have become  indispensable   tools  in the   analysis  of
biological information.  Sophisticated methods of  analysis can now be
carried  out by   running  the  appropriate  software.    Sequence and
structure analysis methods  are continually being developed.  This has
led to  a software explosion in the last  few  years that has caused a
concomitant   degree   of      confusion among biologists    who must,
necessarily, use computer  systems.  It was  recognised that education
of  biologists  in  the use of such  systems  was  crucial therefore a
SERC collaborative computational project (CCP11) was formed in 1990.

CCP11 is specifically  for computational  molecular biology. So far it
has organised colloquia on topics such as multiple sequence alignment.
The  problems of more  general education,  provision of more intuitive
biologist/computer interfaces and the production of documentation from
a biologists eye view are now being addressed.

This document is a  discussion of the  problems as we  see them and of
the possible solutions. Comment is invited.


Duncan Rouch
Frank Wright
Alan Bleasby



1   Introduction

This is a discussion of the problems a biologist may face when using a
computer system. The biologist is the `user' and has  to deal with how
information  is  presented  by  the  computer (the   so  called  `user
interface').  This is a discussion which will be used a basis to write
the next SEQNET guide for molecular biologists.  It addresses both the
users problems and a  strategy for their  solution. This  will involve
changes in the programs as well as in the user guide material (printed
and online).

In a  second document, to  follow,  we discuss more  specifically  the
structure of the proposed user guide.


2   Aim, Approach and strategy

2.1   Aim and approach

 Aim:      to maximise the efficiency of sequence and structure
           analysis by the biologist on a computer system, such
           as SEQNET, that provides a wide selection of programs
           and packages.

Approach:  this can be achieved by providing assistance in the choice
           of the appropriate method, the appropriate program (and its
           associated parameters) and in the interpretation of output.
           The program/parameter selection problem can be tackled
           by appropriate software changes; the problems of choice
           of an appropriate method and the interpretation of output
           must be dealt with by education.

2.2   Strategy

Graphical   displays   customised for    biologists present the   most
attractive route  for solving the selection  problem.  These, however,
may require X-Windows terminals which  not all  sites can afford; most
systems must therefore provide a  text-only user  interface either  in
isolation or alongside a graphical interface.

The goal, for both  software  and education strategies, is to increase
efficiency by streamlining the user-interfaces to programs.   This has
the advantage that the biologist  need only remember a  limited number
of hardware and  software operations.  Within  a given system, changes
to the underlying operating system (e.g.  a  move from VMS to UNIX) or
hardware should ideally be hidden from the user,  a so-called seamless
environment.

Unfortunately,  owing to restrictions on  the availability  of program
source code and copyright, the rewriting  of interfaces cannot quickly
be achieved.  A  more realistic secondary   solution  is to produce  a
flexible   on-line menu  system    to  hide the   polymorphic  program
interfaces.  This has to be combined with improved user support, which
will include the restructured user-documentation.


3   The Biologist's Problem

	Biologists face the problem of how to analyse a sequence
	with computational molecular biology facilities.  They must
	be aware of suitable methods of analysis, learn and remember
	how to use appropriate hardware and software.  They must also
	know how to interpret  output generated by the software, which
	includes awareness of the limitations of the method and
	software used.

The wide range of available sequence  and  structure analysis software
presents molecular  biologists   with the  opportunity to    use  many
different methods of analysis.  Use of these methods currently demands
both  hardware and software  knowledge from  the user.  Acquisition of
necessary hardware  knowledge,  such  as networking,  is a  relatively
trivial process.   However,  the biologist  is  confronted  by an ever
increasing  array of new  software, which  more often  than  not  have
idiosyncratic user-interfaces.  The biologist is therefore required to
learn and remember  a completely new set of software commands for each
unique interface style.  This is unsatisfactory.

Furthermore,  most people  have a range  of  work priorities such that
computing can only take a small fraction of their time:  how else will
they collect data to verify  the computer-aided  predictions?  Even if
they can find  the time to  learn how to use  each program the lack of
constant reinforcement means they  are likely to have to   relearn the
command  knowledge each time they  log on. This lack  of time can also
result in an unfamiliarity with new methods or even a misunderstanding
of existing ones. Inevitably this leads to  inappropriate selection of
programs/parameters,  incorrect  assessment  of  output,   an inertial
barrier against using  new software and  a tendency to blame  software
for the deficiencies of a method.

The more software commands  biologists have  to  know the more  likely
they are  to make  mistakes.  This wastes  time as well as potentially
causing  errors  in data  or   its  interpretation.   The   enthusiasm
programmers  have displayed in  producing  arbitrary  user  interfaces
could, in future, be better channeled into reducing this problem.

In contrast to the polymorphic array of single programs with different
user-interfaces, within a package such as the GCG  system all programs
possess  a common interface  style  and the same  names  are used  for
commands that perform the same activity in different programs.  So, to
use a new  program within the  package, the  biologist need only learn
commands specific to that program.


4   Proposed Solutions

4.1   Software Solutions

4.1.1   A Global Interface Style

	Barring a rewrite of most programs to give a common
	interface style, a more practical short term solution
	is to assign priority for improvement to a subset of
	programs that allow all 'basic' computer-aided molecular
	biology methods to be carried out.

The initial  set  of programs for treatment   will  include all  those
necessary  to allow  fundam



More information about the Bio-soft mailing list