DNA Workbench for X-windows

James Tisdall tisdall at amalthea.humgen.upenn.edu
Mon Nov 14 10:13:17 EST 1994


     Software is very important to many biologists.  I've been glad to
see the important points that have been discussed in this thread, especially
about portability, sparked by the thoughts of a VMS system manager.  I'd
like to add a few comments, from a background of theory and practice; said
practice also including a stint as a systems manager.


Software engineering:
	
	There are many and sometimes conflicting factors to weigh in
developing software.  The developer takes them into consideration and
finds a solution that meets the main design criteria while satisfying as
many of the other software engineering goals as possible.  In other
words, there's a tradeoff between somewhat conflicting goals, as is common
in the great majority of engineering design.

	Portability is (often, not always) an important goal.  The nature of
the computer marketplace and the state of computer science is that there are
many operating systems and languages and types of computers.  So one of the
big tradeoffs in the pursuit of portability is the potentially
large amount of time necessary to ensure that a given bit of software will
run on all these systems.  (And by the way, just writing it in C will not
ensure portability - witness the many courses and books on writing
"portable C".)

	Other important goals include minimum price; minimum labor; minimum
cost of "maintanance" or revising, fixing, and extending the software - 
very often as expensive as the original writing of the software; testability;
and of course suitability of the software for its intended purpose (legal
disclaimers to the contrary).

	Given the design goals, one of the first and most important choices
that a software system designer makes is "what language(s) shall the system
be built in?"

	(I'd like to point out to the biologists that "software engineering"
is an entire subfield of computer science with ongoing research projects
and an extensive literature including dedicated journals, conferences, 
industrial research groups, etc.)

Standards:

	They are good and bad.  The discussion of this thread has pointed up
some of this.  In a young field like computer science, it is important to
recognize that new ideas and approaches are essential; that the most
important standards are de facto standards, determined by the users and
the marketplace; and that even standards become obsolete, sometimes before
they have achieved official sanction.  To most non-computer trained biologists,
software tools have the appearance of "black boxes" - and wouldn't it all
be much easier if everyone would just write their software for Macs?

	But consider parallel computing (which requires new hardware,
operating systems, and languages).  The "standard" way to build and program
parallel computers hasn't been invented yet, or if it has, the world hasn't
realized it.  It's a current, major research topic, and many different
solutions to many different problems are being investigated.  Yet it is
of paramount importance to the computer world in general; scientific
programmers severally; and (computational) biologists in particular.

	My point is that there is no panacea that will solve the inherent
problems of complexity and rapid change in the area of scientific programming.
Some tools are available to help, such as appropriate standards, modularization,
good documentation, software re-use, and so on.  But above all, it is
important to remember that we work in a cultural context, and that there is
a "state of the art", and that it is up to us to try to write software that
takes into account that change is coming.

DNA Workbench and Perl

	Just a few comments on the software that started this thread.

	DNA Workbench is written in the Perl language (which is written in C)
and thus it is as portable as Perl (unix, mac, dos, vms, windows/nt, etc).

	I'm adding graphics to DNA Workbench now, but it isn't finished yet.
The graphics package chosen is Tk (of Tcl/Tk) available for Xwindows.
It is my belief that some new graphics software may come along
that offers greater convenience and portability, so the graphics portion is
written modularly with an eye towards possibly using some other software in
the future.  (I need something which is easy to program, costs little or
nothing for commercial and noncommercial environments, and is portable to
X, macs, dos, windows.)  A major project is underway to port Tk to Macs and
Windows, so perhaps I'll stick with it - we'll see.  Also, it is fairly
easy to port from Perl to C - although I have no immediate plans to do so.

	At present, you can just download a "bin" file from our ftp site
for your mac, and the program will run - no other installation required.
The Unix version does require you to install the Perl language if not already
installed (which requires a C compiler such as the free "gcc").  The dos
version just requires you to grab the "perl.exe" - but since the networking
facilities for Perl on dos are just in development (hopefully to be 
available soon) the dos version of DNA Workbench won't do the internet
access parts of the program as yet.  There is a new port of perl to VMS
which includes networking support - I haven't had a chance to try it yet,
but will be glad to assist those who wish to install it.

	I chose to write DNA Workbench with a simple line interface so that
it will work for just about anybody on any terminal, as long as the Perl
language works on their computer.  The graphics stuff is an optional add-on,
which is very nice, but it is exactly the portability/cost issue that is
difficult here.  The Tk-Xwindows stuff is not yet completed.

	Another important design goal is that everything that can be done
interactively in DNA Workbench can be done in a script or by calling
DNA Workbench from another program.  My experience is that far too much
software in biology lacks this ability.

	Finally, my major design goal in writing DNA Workbench was to build
for myself a computing environment that makes it easy to experiment with
algorithms for computational biology.  For a review from a biologist's
perspective, see a recent issue of TIBS in the Software Corner.  Software
is available by anonymous ftp at cbil.humgen.upenn.edu:/pub/dnaworkbench

Choice				Goals
______                          _____

Perl language            Free
                         Portable (unix, macs, dos, vms, nt, ...)
			 Excellent "string processing" (sequence manipulation)
			 Excellent ability to control other programs
			 Optimized for programmer productivity-fast development
                         Good distributed parallel programming facilites
			 Object oriented
			 Relational database interface, and built-in data stores
			 Graphics support (a "Tk extension" for perl)
			 Good documentation and access to experts
			 Acceptably fast for desired uses

TK graphics              Free
			 Optimized for programmer productivity-fast development
			 Good documentation and access to experts
			 Acceptably fast for desired uses
                         Available for Xwindows
                         Current project to port to Windows, Windows NT, and
                            Macs, backed by a major software company

Best regards,
Jim

======================================================================
James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania
tisdall at cbil.humgen.upenn.edu 215-573-3113
======================================================================
after November 28:
======================================================================
Mercator Genetics
4040 Campbell Avenue
Menlo Park, CA 94025
415-617-0880  FAX x0883
tisdall at mercator.com
======================================================================





More information about the Bio-soft mailing list