Languages for bioinformatics
Andrew Dalke
dalke at ks.uiuc.edu
Fri Feb 27 06:19:15 EST 1998
Tom Walsh <tpwalsh at acer.gen.tcd.ie> said:
> I like the look of Python but a lot more people seem to be using
> Perl, especially in bioinformatics and at the end of the day, you
> have to consider what skills are in demand. Sadly, elegance of
> design is no guarantee of widespread use.
Well, I know of a couple bioinformatics companies doing internal
development/prototyping in Python, but there aren't many. Before I go
into my advocacy rant, let me point you to www.python.org. Okay, I'm
ready:
<advocacy on>
I hope to convince you that Python is more useful than perl by
explaining some of the problems I've had using perl and asserting that
they are less of a hassle in Python.
I've been working on a 15,000+ line perl based application for
manipulating bioinformatics databases and while perl gets things done,
there have been a lot of frustrations because it is hard to do things
I want for even this small to medium sized application:
1) "simple" complex data structures -- I don't even want to think
about writing a tree (eg, to describe an evolutionary tree) in perl.
For that matter, explain the syntax of how to use a reference to a
list used as the value in a hash element:
$data{values} = [ 'This', 'is', 'a', 'test'];
print (@{$data{values}[2], "\n");
while in python it is
data = {}
data['values'] = ['This', 'is', 'a', 'test']
print data['values'][2]
I had to get the Advanced Perl Programming O'Reilly book to learn this
while learning it for Python was almost trivial. (And the author of
that book is a Python proponent.)
2) matrix support (useful in structure analysis)
3) orthogonal namespaces; for example, try to come up with a standard
toolkit in perl containing a routine for doing numeric sort
comparisons. Why doesn't this work?
package tools;
sub numerically { $a <=> $b };
package work;
@a = sort tools::numerically (5, 3, 4, 2, 1);
print "@a\n";
which prints
1 2 4 3 5
(Hint: $a and $b are "special" variables.)
4) generic programming; for example, write a threshold script to
return those elements of a list which are less than a given value. In
perl you have to know if the value is numeric or string to know to use
'<' or 'lt'; that is, which of these should you use?
grep( $_ < $value, @list)
grep( $_ lt $value, @list)
In python the following will work on any data type that supports '<':
def threshold(value, list):
a = []
for ele in list:
if ele < value:
a.append(ele)
return a
or more succiently:
def threshold(value, list):
return filter( lambda x, v = value: x < v, list )
(Though I find perl's grep easier to read.)
5) When do you use local() as compared to my()? Almost never, but you
need it for file and directory handles. Unless you use DirHandle,
which is somewhat quickly dismissed in the perl5 book. Try explaining
all this to new users of perl.
6) Also, explain how to make a new class, blessing it properly, @ISA
and operator overloading. I've stared at the books for many hours and
still do it from starting with the examples, going by touch and feel.
These sorts of difficulties do not often occur in small (<5,000)
line program, so a lot of things work well with perl. However, when
you get to what Python advocates call "programming in the large" these
problems get harder to manage.
I've done a lot of C++, perl and tcl developement, and find python
easier to use, understand, explain and support.
<end advocacy>
Andrew Dalke
P.S.
With the 1.5 version, Python now has "perl5 compatible regular
expressions" which about 1/2 the speed of perl's against the prosite
patterns. But then, perl5 is about 1/2 the speed of perl4 for that.
More information about the Bio-soft
mailing list