open-source software for bioinformatics (was Re: Unix vs Linux - the movie.)

David Mathog mathog at seqaxp.bio.caltech.edu
Fri Jul 28 10:51:58 EST 2000


In article <398069d7$1 at news.ucsc.edu>, karplus at cse.ucsc.edu (Kevin Karplus) writes:
>I'm generally in favor of having source code available for any
>software I buy or use, but reluctant to part with source code that I
>have spent years working on.  

The asymmetry of that position doesn't bother you?  

>
>Having the source code certainly makes bug fixes easier, and even
>compensates somewhat for the usually grossly inadequate and inaccurate
>documentation. 

It also lets the end user run the code on a platform that the developer
doesn't support.  Why should I have to run the same OS as you just to run a
program you developed?   Most Molecular Biology code is relatively portable
in that it just opens files, crunches data, and stores it again - nothing
particularly OS sensitive there.  All I usually need is the code and its
off to the races.  Binaries only and more often than not I'm out of luck.

But there's another side to the "closed source" argument.  That is, I
consider running a program to be "doing an experiment" and closed source
software is the equivalent of a "methods" section which says: "Put your DNA
in the black box containing unknown reagents and incubate for 30 minutes."
That is, it is not reproducible because key aspects of the "method" are
either unknown, or known only poorly.  And I don't mean to impugn anybody's
coding and documenting abilities, but you don't have to look at all that
much code before you encounter a program which does not work in exactly the
manner indicated in the ducmentation.  Back to the methods analogy, that's
like saying a reaction had 10mM MgCl in it, when in fact, it didn't.  No
source code - no way to detect this sort of discrepancy. 

Moreover, the ultimate argument against closed source academic software is
that, in effect, it is roughly analogous to refusing to share reagents.  
While some may argue that shipping a binary is the same as sending somebody
a tube of antibodies, I do not agree.  To verify that software is working
correctly one often needs to get down into the guts of it while it is 
running and poke around with a debugger.  For instance, to figure out why
a program uses too much memory, runs slowly, etc.   I can't even count the
number of times I've had to do this.  And the "idea" of the software is not 
just the input and output, but the way it gets there, which, as I've said 
above, usually cannot be fully understood from just the written 
documentation - you have to look at the code.

> 
>On the other hand, it also makes it much harder for the originator of
>the program to keep the program maintained.  Bug fixes that are done
>at one site are unlikely to propagate back to the original release.

What makes you say that?  I think most of us report bug fixes back to the
program's authors.  

>We have local bug fixes and enhancements for several of the
>open-source programs we use, and I have no idea whether the people who
>did the fixes sent them back to the original maintainers of the
>programs, or if they did, whether they were incorporated into the
>official version.  

If they did not send them back, fire them, that's part of the job.  They
may or may not have been incorporated, and you can't control that in any 
case.

>If we pick up a new release, we may get something
>much buggier than what we currently have.  For that matter,
>open-source software is often fairly unstable with new "features"
>added by one person being incompatible with those added by another.
>It takes some pretty strong reviewing to keep an open-source program
>stable and usable.

And that differs from an MS Office SR release how?  But seriously,
stick with the stable running version and only permanently install the
newer ones when you have reason to believe that most of the kinks have
been worked out.  It's up to the person who controls the software
(and even freeware is usually controlled by a person or small group of 
persons) to integrate changes successfully - that doesn't mean that
people should stop submitting them.

One odd thing about the commercialization of academic software has been 
that the profits, however much they are, never seem to go back in such a 
way as to result in a more professionally maintained academic product. 
Can anybody here who has had their software "commercialized" report that 
they received increased funding for development or maintenance of the 
software as a result?  If not, WHY ARE YOU COMMERCIALIZING IT?

>
>If you decide (as our University encourages us) to sell the program to
>commercial companies, while giving it away to academics, government
>labs, and non-profits, then you have to retain control of the source
>code, or it is too easy to pirate the code.  This doesn't mean that
>the source code needs to be secret, but it may need to be protected by
>license agreements.  

If you've published the algorithm and somebody wants it badly enough 
they'll rewrite the program from scratch.  Your copyrights won't protect 
you in that case and neither will your license agreement, as they will 
never have signed it.  If you hold some sort of patent on the algorithms
in the program  you may still have legal grounds for a suit, but not much 
software is protected in this manner.  (Who knows about the future though - 
the US Patent office grants patents for just about anything these days). 
Providing source code does not decrease your legal rights by one iota.
Admittedly though providing source code makes it easier for somebody to
illegally incorporate your code into their own.  That happened to GCG,
after which they took the source code out of their distributions, and as a
consequence rendered it *much* less useful than it formerly had been. 

I also want to point out that on one hand you're asking outside maintainers
of  your academic code to report bug fixes back to you, that is, act as
minor developers, and at the same time are not offering to extend to them
any of the profits which will result from their work.  And tracking down
bugs in other people's software is a heck of a lot of work!

>
>It turns out that most users don't care about the source code, so just
>distributing executables and documentation satisfies over 90% of the
>users (how many of you have had to get source code for BLAST?).

Most users also don't care how the program works  - they "just want the
results".  Bad attitude, but there you are.  And yes, I have used the
source code for BLAST (and would that one could obtain ONLY the source code
for BLAST and not the entire NCBI toolkit!) 

>If you want to retain control over your program (an idea that is
>anathema to some in the open-source community), then a two-tier
>license agreement is often the best strategy---most users get a simple
>license with permission to use the executables, and those who really
>have a need for the source code get a more detailed license.

I fill out those licenses all the time.  But I've got to tell you that I 
really hate them with a passion - there are already copyrights stamped all
over the code so I can't legally merge it into a commercial product.  I don't
mind the part about noncommercial usage (although in reality how the heck
am I going to prevent one of the professors, who all have commercial ties
these days, from using the academic version of the software for commercial
purposes?) The licenses also make me legally liable if the source code gets
out.  If the source code was available generally I wouldn't have to worry
about that.  The source code invariably lives on the same machine that a
couple of hundred users have access to.  One slip on the protection
settings and somebody could copy it.  Or what about my backup tapes? Or
what about a disk on a decommissioned machine?  Or what if a Unix machine 
gets hacked?  I'm running a research facility, not a nuclear weapons lab,
and I resent the extra security burden this entails. 

Regards,

David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 







More information about the Bio-soft mailing list