Blastall run from CGI script

Tim Cutts timc at chiark.greenend.org.uk
Fri Sep 15 07:28:09 EST 2000


>[blastall] WARNING: could not find index files...
>
>I've already used formatdb to create these index files.  The correct
>path is given in the CGI script.  My webserver (user, group: http) owns
>all of these files.  All of these files are in the home directory of
>http (not httpd).
>
>The .ncbirc file has been created with appropriate paths for Data and
>BLASTDB.  This file has been put in what I can imagine would be every
>possibly relevant directory (dir containing blastall, cgi-bin dir, http
>home dir).

Try setting the BLASTDB environment variable in your CGI script.  I
never bother with a .ncbirc file, myself.

i.e. put the following at the head of your CGI script:

$ENV{'BLASTDB'} = "/wherever/they/are";

You may find that you need to enhance your script considerably to avoid
the following two problems:

1)  Some web browsers time out if the search takes a long time, and the
output will never make it.

2)  If the user gets bored and hits the stop button, this does not stop
the search from running to completion, which is a waste of CPU time.

To fix both of these problems, you need to do some clever tricks:

1)  Store the process ID of the perl script in a avariable:

$ppid = $$;

2) Set up a signal handler for SIGPIPE, such that if you receive that
signal you kill process $ppid and all its children.

3)  Before you execute the blastall, fork() so that there
are two copies of the script running, one of which (A) knows the process
ID of its child (B), and both of which know the original process ID.

4) Start printing some sort of progress information to STDOUT in process A:

while (1) {
  sleep(10);
  print "Still waiting for results\n";
}

This serves three purposes (a) the user can see that something is actually
happening, (b) the browser won't timeout and (c) it will generate a
SIGPIPE if the browser disappears because the user hit the stop button.

5)  In process B, start the blast, using a standard open() function,
and parse the output however you like to STDOUT (I used to replace
sequence IDs with hotlinks to an SRS server).

6)  When the BLAST has finished, kill process $ppid and all its
children.

I have code to run FASTA in this way, somewhere, but goodness knows
exactly where.  I wrote it at my old job (the server is at
http://www.bio.cam.ac.uk/cgi-bin/fasta3/fasta3.pl)

The script only uses the above method if you select "Now while you
wait", otherwise it is submitted to a batch queue and the results mailed
back to you, which is the default.

Tim.







More information about the Bio-soft mailing list