DNA Workbench for X-windows
mathog at seqvax.caltech.edu
Fri Nov 11 17:10:00 EST 1994
>Looking the gift horse straight in the mouth...
A few responses to the responses (names deleted from all quotes).
The part that seems to have rubbed the most people the wrong way is:
1. It is written in ANSI C or Fortran 77 (but NOT both).
This is my take on the objections raised. First, that enforcing the use
of programming standards would be too limiting to creativity or
>Any standards run the risk of being straightjackets.
>My primary point was: let's
>not rush out and convince grant agencies to fund only if it will be
>lowest common denominator software.
The taxpayer in me says "tough potatoes" if the chosen standards are not to
all programmer's liking. From a funding viewpoint, any software that is
developed should be as portable as possible. Now I'll grant you that
fortran is better for algorithmic work than for some other things, but it's
hard to see where the creativity limitations in C lie. Rather, the problem
with C is more often that the programmer gets a bit too creative!
There is NO QUESTION that the two languages mentioned are the two most
portable ones around, ANSI C being probably slightly more portable than
Fortran 77. If we want the most code, on the most systems, for the least
money, it will be written in one or the other of these two languages. In
time other languages may meet the same criteria and at that point it would
be acceptable to write in those languages too.
I used to agree with the idea that it was OK to write code for research
projects in a nonportable manner and patch them up later. Unfortunately,
sad experience has shown that the "patching up" can be next to impossible.
The software that started this thread is built around Xview and it would
take a major effort for anybody to patch around that. WCS, another
research project, but one with many, many users, is so Sun specific it
hurts. It seems that in both these cases whatever benefits were accrued
by coding for a particular platform have long since been offset by an
inability to reach a larger audience. Also, the more people that use a
piece of software the more feedback that gets back to the developers, and
the faster the software improves. Since the audience is larger for
software that runs on more platforms, it tends to improve more rapidly than
single platform software. For instance, look at gopher, look at WWW.
Another objection was that the languages mentioned were either inadequate,
antiquated, or not free:
>but it is silly to force people to develop modern software with 70's
>and older technology.
>Anything which can be expressed in C++ can be expressed in C
>(which is what Cfront does), but good C++ is much more readable,
>reusable, and maintainable. On the other hand, there are an awful
>lot of great things in PERL, and it is the use of PERL which makes
>DNA Workbench so powerful (no size limits on sequences, full regular
>expression searching, extensibility).
>If you force developers to use FORTRAN 77 and C instead of more modern
>languages (such as FORTRAN 90 and C++), the software will be buggier,
>harder to reuse, and harder to maintain. Grant reviewers should weigh
>these considerations against portability.
>Doesn't really matter, but Fortran is being unbundled and I don't know
>if Linux has a fortran compiler.
First of all, writing in the latest trendiest language may make the
programmer feel good, and it may make the coding for specific operations
somewhat easier, but near as I can tell, it has little material effect on
the final operation of the program. It does have a huge impact on how
portable the resulting code is. I vote for portability.
Second of all, C++ is a bit of a special case. So long as the developer
has the tools to convert their C++ code to ANSI C, and at least the C code
is distributed, then go ahead and write in C++. However, C++ compilers
are far rarer beasts than some of you would believe, and so C++ code should
not be distributed alone since it fails the "generally available" compiler
criteria. I'm also concerned that the C++ standard may not be nearly so
well defined as those for the languages that I mentioned (feel free to
correct this if it's wrong). If Linux has no fortran 77 compiler then it
would be the only major platform that does not. ANSI C is certainly
available everywhere. However, the only free ANSI C compiler that I know of
these days is Gnu C. ANSI C on all other platforms is already an extra cost
item. If you think not, check your maintenance/ licensing agreements. So
what if compilers cost money? So does the operating system, so does the
hardware, so does support. This is the real world.
The comment that C++ or Fortran 90 code is inherently less buggy and harder
to maintain than is C or Fortran 77 is at least unproved, and most likely
wrong. I'm old enough to have lived through several methodology shifts,
each claiming the same benefits that object oriented languages do now.
Nevertheless, the quality of a program still seems to come down to the
programmer's skill and little else.
>Grant money flows for publications, rarely for software: when this
>changes, portability will come quickly.
This is right on. People are rewarded in various ways for publishing high
profile papers. Because of this, there are measures around for determining
a paper's impact. For instance, by tracing references back. Perhaps we
need an equivalent method for software, so that we would have some rational
basis for rewarding those who do the most, and decreasing funding for those
who do the least. Let's see, the quality measures that spring to mind
1) number of program uses/user-year
2) number of program users/year.
3) number of programs that include derived code
The first two values would be a bit hard to get on PCs and Macs, but they
shouldn't be too difficult to come up with on Unix and VMS. The last one
would require some sort of code analysis. Hey, at last a good use for all
of the plagiarism detection software and hardware at the NIH!
Off the top of my head, I'd guess that BLAST and FASTA are way, way, way up
in the winners circle for 1 and 2. Hard to say who would win for 3.
mathog at seqvax.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech
More information about the Bio-soft