DNA Workbench for X-windows

John Barton jjb at watson.ibm.com
Sat Nov 12 10:16:17 EST 1994

In article <11NOV199414105367 at seqvax.caltech.edu>, mathog at seqvax.caltech.edu (David Mathog) writes:
|> >Looking the gift horse straight in the mouth...
|> A few responses to the responses (names deleted from all quotes).
|> The part that seems to have rubbed the most people the wrong way is:
|>   1.  It is written in ANSI C or Fortran 77 (but NOT both).
|> This is my take on the objections raised.  First, that enforcing the use
|> of programming standards would be too limiting to creativity or 
|> productivity:
|> >Any standards run the risk of being straightjackets.
|> >My primary point was: let's
|> >not rush out and convince grant agencies to fund only if it will be
|> >lowest common denominator software.  
|> The taxpayer in me says "tough potatoes" if the chosen standards are not to 
|> all programmer's liking.  From a funding viewpoint, any software that is
|> developed should be as portable as possible.  Now I'll grant you that
|> fortran is better for algorithmic work than for some other things, but it's
|> hard to see where the creativity limitations in C lie. Rather, the problem
|> with C is more often that the programmer gets a bit too creative! 
|> There is NO QUESTION that the two languages mentioned are the two most
|> portable ones around, ANSI C being probably slightly more portable than
|> Fortran 77.  If we want the most code, on the most systems, for the least
|> money, it will be written in one or the other of these two languages.  In
|> time other languages may meet the same criteria and at that point it would
|> be acceptable to write in those languages too. 

   While I am sympathetic the goal of software that is easy for a system
administator to install, it is simply not the most important goal for
scientific software generally.  Scientific software is written by scientific
programmers, not by grantors or by programming languages.  In my experience,
these programmers generally fall into three camps; each makes important

   First, there are those
who learn as little about computers as they can get by with and concentrate
upon producing computations.  These programmers often produce excellent
results but their programs are often software engineering disasters that they 
alone can use.  

   Second, there are those who learn a great deal about computing
and push to apply new software technology to scientific programs.  These
programs are more often well structured and reusable, but they include
technologies (C++, graphics, etc) that are not portable to every machine
and the effort expended on software technology often comes at the expense
of the scientific content.

  A third produces the bulk of the widely used shared
software.  Here decades old technologies are embraced and applied to
problems solved five and tens years ago to produce a basic package of
end-user software for a niche market of user-scientists that could not
purchase the software commercially.  Many of the nuances of these programs
are state of the art and when they arrive concident with other scientific
advances (eg sequencing software) their impact is widespread and extremely
significant to many scientists.

  The first two groups are vital to the success of the third group.
If we force all scientist programmers into the third group, the resource
formerly provided will dry up and we will be stuck with software portable
on any computer built in 1977 and solving any problem important in 1985.

  An alternative solution to the orginal problem of system administration
would be to stop all funding for portable scientific software and to give
all of that money to the end-user scientists earmarked for purchasing
scientific software.  In this way we create a market for scientific software
and harness the powerful forces of free enterprise to insure portability
and mass-appeal.  As a bonus we could simplify installation (A:install
in the Program Manager Run dialog...) and dispense with system administrators

|>...some deletions...
|> >Grant money flows for publications, rarely for software: when this
|> >changes, portability will come quickly.
|> This is right on.  People are rewarded in various ways for publishing high
|> profile papers.  Because of this, there are measures around for determining
|> a paper's impact.  For instance,  by tracing references back.  Perhaps we
|> need an equivalent method for software, so that we would have some rational
|> basis for rewarding those who do the most, and decreasing funding for those
|> who do the least.  Let's see, the quality measures that spring to mind 
|> are:  
|>    1) number of program uses/user-year
|>    2) number of program users/year.
|>    3) number of programs that include derived code
|> The first two values would be a bit hard to get on PCs and Macs, but they 
|> shouldn't be too difficult to come up with on Unix and VMS.  The last one
|> would require some sort of code analysis.  Hey, at last a good use for all
|> of the plagiarism detection software and hardware at the NIH! 
|> Off the top of my head, I'd guess that BLAST and FASTA are way, way, way up
|> in the winners circle for 1 and 2.  Hard to say who would win for 3. 
|> Comments?

   The simpilist quality measurement method for portable end-user software
would be annual revenue: as above take the money away from developers and give
it to end-users directly.    The difficult issue in judging scientific work
is not to see how much it is used this year, but to understand its *potential*
impact in the years to come.  This is the essential dividing line between
research and development.  Blurring this line to support development of
end-user scientific software rather than supporting its purchase is only
one approach.  Let's not let this approach dominate and crush all kinds
of scientific programming efforts.  Instead, let's work to see software
publication compete with journal publication in terms of quality and value.

  (Since I managed to insult everyone involved in scientific software, I hope
you can see that I am just trying to illustrate that trade-offs have 

|> David Mathog
|> mathog at seqvax.bio.caltech.edu
|> Manager, sequence analysis facility, biology division, Caltech 


John J. Barton        jjb at watson.ibm.com            (914)784-6645
H1-C13 IBM Watson Research Center P.O. Box 704 Hawthorne NY 10598

More information about the Bio-soft mailing list