BLAST 2 info needed

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Wed Nov 11 11:08:36 EST 1998


In article <3644FD16.1C3418B1 at compusmart.ab.ca>, Andrew Gardner <andrewg at compusmart.ab.ca> writes:
>I am working on a BLAST client program.  I am able to run BLAST 1
>searches over the web at NCBI by sending requests to the URL recommended
>in the documentation at ftp://ncbi.nlm.nih.gov/blast/blasturl/
>

After my signature you'll find such a beast.  It's written in DCL,
so unless you're on a VMS system, it won't do you much good directly, but
you should be able to read it to see how it works.  Basically the trick
for writing command line (that is what you're doing, right?) clients for
web servers is to save the submission page from your browser, and edit
the client so that it can set all fields, then use a program like 
rep_client (which was in the demo on the NCBI site) to stuff the request
into the server and wait for the response.  How you handle waiting around
for the request is OS specific though. 

I vaguely recall that some of the NCBI servers will send you back a file
in text mode, and others insist on HTML, unless you tell it to email,
then you can get text.  (They may have changed this in the 10 months
since I wrote the client.)  The client below leaves a text mode
result file on disk.

Regards,

David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 

**************************************************************************

$! SAF_BLAST.COM
$!
$! 7-JAN-1998, David Mathog, Biology Division, Caltech
$!
$! Command procedure to access the BLAST server via the web interface.
$!
$! See below for the symbols it looks for and command line format.
$!
$! subroutine to list program options
$!
$ listprogram: subroutine
$   type sys$Input
Select the program to run, your options are:

  Program    Query   against->    Database
  ------------------------------------------------
  blastn     Nuc                  Nuc
  blastp     Pep                  Pep
  tblastn    Pep                  Nuc->Pep (6 frames)
  tblastx    Nuc->Pep (6 frames)  Nuc->Pep (6 frames)
  blastx     Nuc->Pep (6 frames)  Pep

$ exit
$ endsubroutine
$!
$! subroutine to list datalib options
$!
$ listdatalib: subroutine
$   type sys$Input

Select the datalib to search, your options are:

  nr          nonredundant (Pep or Nuc)
  month       all entries < 30 days old (Pep or Nuc)
  swissprot   Swiss Protein (Pep)
  dbest       nonredundant EST sequences (Nuc)
  dbsts       nonredundant STS sequences (Nuc)
  pdb         Sequences from 3D structure files (Pep)
  vector      Vectors (Nuc)
  kabat       Sequences of immunological intereset (Pep)
  mito        Mitochondrial (Nuc)
  alu         Some ALU sequences from REPBASE (Nuc)
  epd         Eukaryotic Promoter Databse (Nuc)
  yeast       S.cerevisiae genome (Nuc, or coding sequences, Pep)
  gss         Genome survey sequences (Nuc)
  htgs        High throughput genomic sequences (Nuc)
  E.coli      E. coli genome (Nuc, or coding sequences, Pep)

$ exit
$ endsubroutine
$!
$! subroutine to list command line options, and symbols
$!
$ listcommand: subroutine
$   type sys$Input

Usage:  @saf_blast P1  P2 P3

  Where:

  P1  name of query sequence file, such as "sequence.gcg"
  P2  (Optional) Comma separated list for fields to prompt for
       if they are not supplied by symbols (see below.)
       Example: "EXPECT,CUTOFF" would prompt for the EXPECT and CUTOFF
       values to use with the search. "OUTFILE" would prompt for
       an output file name.
  P3  (Optional) "Start,End" - limit the query to this region of
      the sequence. Must be enclosed in double quotes.  Examples:
         "1000,2000"   from 1000 to 2000 inclusive
         "1000,"       from 1000 to the end
         ",2000"       from 1 to 2000
         ","           the whole sequence
       If you want to specify P3 and not P2 (or P1), you 
       must use this syntax:  blast "" "" P3


If blast_FIELD symbols are defined they will override defaults.
The symbols this procedure looks for are:

 blast_INFILE    The GCG formatted query sequence (input file)
 blast_OUTFILE   Name for BLAST output file

 blast_PROGRAM   blastn, blastp, tblastn, tblastx, blastx
 blast_DATALIB   nr, month, swissprot, dbest, dbsts, pdb, vector, 
                    kabat, mito, alu,epd, yeast, gss, htgs, E.coli 

 blast_EXPECT    default, or any floating point number
 blast_CUTOFF    default or any number >=0
 blast_MATRIX    default, BLOSUM62, PAM40, PAM120, PAM250, IDENTITY
 blast_STRAND    both, top, bottom
 blast_FILTER    default, none, dust, SEG, SEG+XNU, XNU
 blast_HISTOGRAM if set, a histogram is drawn, default is none
 blast_NCBI_GI   if set, show NCBI gi numbers in output, default is not to
 blast_DESCRIPTIONS default or any number >=0
 blast_ALIGNMENTS   default or any number >= 0
 blast_ADVANCED    other BLAST command line options
 blast_EMAIL     send response via email to address it holds
 blast_HTML      send response in HTML format

$ exit
$ endsubroutine
$!
$! define symbols for program used
$!
$ hereis = f$environment("PROCEDURE")
$ hereis = f$element(0,"]",hereis) + "]"
$ rep_client = "''hereis'rep_client"
$ FIELDS ="INFILE,OUTFILE,PROGRAM,DATALIB,EXPECT,MATRIX,STRAND,FILTER,HISTOGRAM,NCBI_GI,DESCRIPTIONS,ALIGNMENTS,ADVANCED,EMAIL,PATH"
$! path is a bit different, called EMAIL externally
$ PATH=""
$!
$!
$ promptfor = "''P2'"
$ promptfor = f$edit(promptfor,"COLLAPSE,UPCASE")
$!
$! INFILE
$!
$ askfor = "INFILE"
$ default = "''P1'"
$ blab = ""
$ gosub doprompt
$ if ("''INFILE'" .eqs. "")
$ then
$   call LISTCOMMAND
$   exit
$ endif
$!
$ if (P3 .nes. "")
$ then
$   startfrom = f$element(0,",",P3)
$   startfrom = f$EDIT(startfrom,"COLLAPSE")
$   endat     = f$element(1,",",P3)
$   endat     = f$EDIT(endat,"COLLAPSE")
$   if(startfrom .eq. ",")then goto badp3
$   if(startfrom .eqs. "")
$   then
$     section = ""
$   else
$     section = "/begin=''startfrom'"
$   endif
$   if(endat .eq. ",")then goto badp3
$   if(endat .nes. "")
$   then
$     section = section + "/end=''endat'"
$   endif
$ endif
$ goto notbadp3
$!
$ badp3:
$   write sys$output "The range you specified [''P3'] is invalid"
$   Type sys$input

   Use one of these forms:

         "1000,2000"   from 1000 to 2000 inclusive
         "1000,"       from 1000 to the end
         ",2000"       from 1 to 2000
         ","           the whole sequence

   The double quotes on each side are MANDATORY

   If you want to specify P3 and not P2 (or P1), you 
   must use this syntax:  blast "" "" P3

$  exit
$!
$notbadP3:
$!
$ time = f$time()
$ killstring = f$cvtime(time,,"hour") + -
      f$cvtime(time,,"minute") + -
      f$cvtime(time,,"second") +  -
      f$cvtime(time,,"hundredth")
$ killfile = "KILL_" + killstring + ".seq"
$ comfile  = "KILL_" + killstring + ".com"
$ mypid = f$getjpi("","PID")
$ back= f$extract(4,4,mypid)
$ subname = "K" + back + killstring
$!
$ tofasta/infile='infile'/out='killfile' 'section'/default
$!
$! PROGRAM
$!
$ checkstring="blastn blastp blastx tblastx tblastn"
$ if (f$type(BLAST_PROGRAM) .eqs. "")then promptfor = promptfor + ",PROGRAM"
$ topprogram:
$ askfor = "PROGRAM"
$ default = "blastn"
$ blab    = "LISTPROGRAM"
$ gosub doprompt
$ PROGRAM = f$EDIT(PROGRAM,"COLLAPSE,LOWERCASE")
$ if(f$length(checkstring) .eq. f$locate(PROGRAM,checkstring))
$   then
$   write sys$Output "''PROGRAM' is not a valid option for PROGRAM"
$   goto topprogram
$ endif
$!
$! OUTFILE
$!
$!
$! come up with a name for the output file, if one is not supplied
$!
$ askfor = "OUTFILE"
$ default = infile
$! strip off any nasty characters which might be in it now
$! looks for [] or logical: and removes them
$!
$ tdefault = f$element(1,"]",default)
$ if (tdefault .eqs. "]")then tdefault = default
$ default = tdefault
$ tdefault = f$element(1,":",default)
$ if (tdefault .eqs. ":")then tdefault = default
$ default = tdefault + "."
$ default = f$element(0,".",default)
$ default = default + ".''PROGRAM'"
$ write sys$output "output will be ''default'"
$ gosub doprompt
$!
$!  DATALIB
$!
$ checkstring="nr month swissprot dbest dbsts pdb vector kabat mito alu epd yeast gss htgs e.coli"
$ topDATALIB:
$ askfor = "DATALIB"
$ default = "nr"
$ blab    = "listdatalib"
$ gosub doprompt
$ DATALIB = f$EDIT(DATALIB,"COLLAPSE,LOWERCASE")
$ if(f$length(checkstring) .eq. f$locate(DATALIB,checkstring))
$ then
$   write sys$Output "''DATALIB' is not a valid option for DATALIB"
$   goto topDATALIB
$ endif
$!
$! this next may or may not be necessary, BLAST documentation is unclear
$!
$ if(DATALIB .eqs. "e.coli")then DATALIB = "E.coli"
$!
$! EXPECT
$!
$ askfor = "EXPECT"
$ default = "default"
$ blab    = ""
$ gosub doprompt
$!
$! CUTOFF
$!
$ askfor = "CUTOFF"
$ default = "default"
$ gosub doprompt
$!
$! MATRIX
$!
$ askfor = "MATRIX"
$ default = "default"
$ gosub doprompt
$!
$! STRAND
$!
$ askfor = "STRAND"
$ default = "both"
$ gosub doprompt
$!
$! FILTER
$!
$ askfor = "FILTER"
$ default = "default"
$ gosub doprompt
$!
$! HISTOGRAM
$!
$ askfor = "HISTOGRAM"
$ default = ""
$ gosub doprompt
$!
$! NCBI_GI
$!
$ askfor = "NCBI_GI"
$ default = ""
$ gosub doprompt
$!
$! HTML
$!
$ askfor = "HTML"
$ default = ""
$ gosub doprompt
$!
$! DESCRIPTIONS
$!
$ askfor = "DESCRIPTIONS"
$ default = "100"
$ gosub doprompt
$!
$! ALIGNMENTS
$!
$ askfor = "ALIGNMENTS"
$ default = "100"
$ gosub doprompt
$!
$! ADVANCED
$!
$ askfor = "ADVANCED"
$ default = ""
$ gosub doprompt
$!
$! EMAIL/PATH
$!
$ askfor = "EMAIL"
$ default = ""
$ gosub doprompt
$ if(EMAIL .nes. "")
$ then
$   PATH  = EMAIL
$   EMAIL = "IS_SET"
$ endif
$!
$!  now assemble the command file to send
$!
$!
$! create a stream-lf file because the sequence will also be stream lf,
$! and otherwise append generates warnings.  STREAMLF must be a system
$! wide symbol that maps to something like:
$! create/fdl=shrdisk:[shared.misc]streamlf.fdl
$!
$ streamlf 'comfile'
$ open/append ofil: 'comfile'
$!
$ write ofil: "www.ncbi.nlm.nih.gov"
$ write ofil: "POST /cgi-bin/BLAST/nph-blast_report HTTP/1.0X"
$! write ofil: "WWW_BLAST_TYPE unfin_gen"
$!
$! skip INFILE,OUTFILE, those don't go to NCBI
$!
$ count = 2
$ allfields:
$ string = f$element(count,",",fields)
$ if(string .nes. ",")
$ then
$   value = 'string'
$   value = "''value'"
$   if(value .nes. "")then write ofil: "''STRING' ''VALUE'"
$   count = count + 1
$   goto allfields
$ endif
$ write ofil: "BEGIN"
$ close ofil:
$ append 'killfile' 'comfile'
$ delete 'killfile';
$!
$! run it in a subprocess so that we can keep track of it
$!
$! type 'comfile'
$ create 'outfile'
$ spawn/nowait/input='comfile'/output='outfile'/process='subname' -
  run 'rep_client'
$!
$! find the darn subprocess!
$!
$ context = ""
$ findsub:
$    apid = f$pid(context)
$    if(apid .eqs. "")
$    then
$      write sys$Output "Fatal error, connection to NCBI died"
$      exit
$    endif
$    procname = f$getjpi(apid,"PRCNAM")
$    if(procname .nes. subname)then goto findsub
$!
$! apid is its ID, procname is its process name
$!
$ write sys$output "Now processing job, subprocess is ''APID'"
$ 
$ waiting:
$   procname = ""
$   define/user/nolog sys$error nla0:
$   define/user/nolog sys$output nla0:
$   procname = f$getjpi(apid,"PRCNAM")
$   deass/user sys$error
$   deass/user sys$output
$   if (procname .eqs. "")then goto done
$   time=f$time()
$   write sys$output "still waiting at ''time'"
$   wait 00:00:20.00
$   goto waiting
$!
$ done:
$ fime=f$time()
$ write sys$output "BLAST job completed at ''time', results in ''OUTFILE'"
$ delete 'comfile';
$ exit
$!
$! prompt routine.  Set response to default value, then override that
$!   with symbol's value (if it exists), and lastly, override that with
$!   a prompt, if it was asked for on the command line
$!
$ doprompt:
$   'ASKFOR' = default
$   if (f$type(BLAST_'ASKFOR') .nes. "")then 'ASKFOR' = blast_'ASKFOR'
$!
$   if(f$locate(askfor,promptfor) .eq. f$length(promptfor))then return
$   if("''blab'" .nes. "")then call 'blab'
$   READ/PROMPT="Enter a value for ''ASKFOR': "  sys$command response
$   'askfor' = response
$   return




More information about the Bio-soft mailing list