open-source software for bioinformatics (was Re: Unix vs Linux - the movie.)

John S. J. Anderson jacobs+usenet at genehack.org
Mon Aug 14 22:25:01 EST 2000


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

jkb at arran.mrc-lmb.cam.ac.uk (James Bonfield) writes:

> In <87snsed3f4.fsf at genehack.org> "John S. J. Anderson"
> <jacobs+usenet at genehack.org> writes:

(Apologies for the delay in responding...)

[ gap4 codebase size ]
> > How much 'core' code versus 'interface'?
[snip]
> The main gap4 directory has about 90K lines of C (perhaps 10K of
> that is interface) and 18K lines of Tcl (nearly all interface).
> 
> The other bits are libraries, some of which is interface, but most
> of which isn't (eg file formats, dynamic arrays, IO, database
> handling, etc).

So, about 25% interface, 75% 'real code'. Is that fair?

> Anyway - none of these separate figures are really important except
> the total; that's what would need to be "reviewed" after all.

Well, that's going to vary from case to case, really. Yes, if you
publish a paper that's dependent on the whole 108 kLOC, then it should
all be reviewed. If you publish several papers, documenting the
ongoing assembly of this large piece of software, then it's a more
manageable job.

I'm also not convinced that "it's going to be really hard" is a
convincing counter argument to the following logic:


   Given that:
   Peer review is done to make sure the conclusions of papers are
   'correct'.
   
   And given that:
   For some papers, software plays a critical role in determining the
   conclusions of the paper.

   And, finally, given that:
   It is not possible (or, at least, it is orders of magnitude more
   difficult) to determine if a given piece of software produces the
   'correct'/intended results without access to the source code.

   THEREFORE
   Access to the source code of software used in reaching the
   conclusion(s) of a paper is required in order for a proper,
   thorough peer review of the paper.

What am I missing?

> Ok, I'm inclined to agree. There are some bits of code which end up
> being used in ways not originally thought of and yet still work
> perfectly. To me that's a good indication of good design (although
> it may still look ugly). As a person with interests in IOCCC
> (obfuscated C) I know I _can_ write ugly code, but I hope this also
> teaches me what to avoid. There is no 'deliberately' obfuscated code
> in our software :-)

Well, deliberate obfuscation is in some ways better: if you know it
was fscked up on purpose, you can at least assume a certain level of
competence (and maliciousness) on the part of the coder. When you're
not sure, you can't tell if the coder was terribly brilliant, or just
terrible. 

john.


- -- 
- ----------------------------------------------------------------------------
           [ John S Jacobs Anderson ]------><URL:mailto:jacobs at genehack.org>
[ Genehack: Not your daddy's weblog ]------><URL:http://genehack.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.2 (GNU/Linux)
Comment: Mailcrypt 3.5.5 and Gnu Privacy Guard

iD8DBQE5mLgMWRJRdOm3KFARArXvAJ9qOBynlBO6E20Xj41dVBz8+BBGVQCfRUyM
zRDR6O6g3M2p7N0fpJdAqco=
=quVA
-----END PGP SIGNATURE-----







More information about the Bio-soft mailing list