IMPORTANT - New GenBank BLAST Database Search E-mail Server

Dave Kristofferson kristoff at GENBANK.BIO.NET
Tue Sep 3 13:08:44 EST 1991


GenBank is pleased to announce the availability of a new e-mail server
for database similarity searches.  The BLAST program has been made
available to us by NCBI (reference below) and instructions for its use
are appended below.

Currently the server allows searches of the latest quarterly releases
of GenBank, PIR, and SWISS-PROT.  Access to the EMBL database and the
daily GenBank and EMBL updates will be added soon.

Currently the blastn and blastp programs for nucleic acid and protein
searches are available although other options may also be added in the
near future as computing resources permit.

We are starting this off by limiting execution to two queues for
nucleic acid (i queue) and protein searches (h queue).  Only one job
will execute simultaneously in each queue.  BLAST is very fast,
however, so we do not expect the queues to become very lengthy.
However, we will monitor the situation and make adjustments as needed.

Although we believe that the system is operating correctly, please
assist us by reporting any bugs or other problems to
blast-req at genbank.bio.net.

The FASTA e-mail server (search at genbank.bio.net) also continues in
operation (see note below).

				Sincerely,

				David Kristofferson, Ph.D.
				GenBank Manager

				kristoff at genbank.bio.net


----------------------------------------------------------------------

BLAST Mail Server Help Document


	      BLAST - Basic Local Alignment Search Tool


BLAST was developed by the National Center for Biotechnology
Information at the National Library of Medicine and kindly made
available for use on the GenBank On-line Service.  The program employs
a heuristic search algorithm to compare an amino acid query sequence
against a protein sequence database or a nucleotide query sequence
against a nucleotide sequence database.  The BLAST program compares
sequences with databases using an ungapped alignment algorithm:

	S. F. Altschul, W. Gish, W. Miller, E. W. Myers and 
	D. J. Lipman (1990) J. Mol. Biol.  215, 403-410.

If you use BLAST as a research tool, we ask that this reference be
cited in your paper.

You can access the GenBank BLAST Mail Server through a number of
different networks, including Internet, BITNET, EARN, NETNORTH and
JANET.

The GenBank BLAST server allows you to send a specially formatted mail
message containing the nucleic acid or protein query sequence to the
BLAST Server at GenBank.  A BLAST sequence similarity search is then
performed against the specified database using the BLAST algorithm.



			 **** DISCLAIMER ****

GenBank provides access to several different search algorithms.
Please note that we make no claims that the results from either of one
of our servers (BLAST vs. FASTA) are to be preferred over the other.
While BLAST is faster than FASTA, users should come to their own
conclusions about search sensitivity based on a comparison of their
own results before deciding which algorithm is suited for their
purposes.  Yet another search algorithm (FASTDB) is available only for
interactive use on GOS.  Please be aware that all of these programs
may produce somewhat different search results.  To obtain instructions
for the FASTA e-mail server, send the message HELP to the address
search at genbank.bio.net (leave the Subject: line blank).

                         ********************


Accessing the BLAST program

To access the program, send an electronic mail message containing the 
formatted query sequence (as described below) to the following Internet 
address:

	BLAST at GENBANK.BIO.NET   

If you are not on Internet, you may need to change the format of the
address.  Consult your systems manager to determine the correct
address format.


Obtaining Help

If you would like to receive instructions on using the BLAST program,
send a mail message to the address above containing the word "HELP" on
a single line of the mail message.  Leave the Subject line in the mail
header blank.  Appended to the end of the help text is the BLAST manual
page.  This document will describe specific BLAST program functions.
For additional help on using BLAST, contact GenBank at (415) 962-7307
or send an electronic mail message to the address:

		      CONSULTANT at GENBANK.BIO.NET


Databases for use with BLAST

The following databases are currently available for BLAST searches:

   Designator                  Database
   ----------                  --------
   GenBank                     Latest GenBank quarterly release.

   SWISS-PROT         	       All of the SWISS-PROT protein database.

   PIR			       All of the PIR protein database.

GenBank is a nucleic acid sequence databases and SWISS-PROT and PIR
are protein sequence databases.  Currently BLAST (which uses a
compressed database format) does not search the daily GenBank updates.
This remains to be implemented.  Other databases will be added soon
too.


Formatting a Query

Queries consist of a mail message with search parameters identifying
the program (blastp for proteins or blastn for nucleic acids), the
database to be searched, values related to the search, and the query
sequence to be used in the search.  The mail message has three
mandatory lines, one optional line, and a line identifying the query
sequence as described below.  These lines are typed into the body of
the mail message in the order shown below:


 Search 
Parameter	Mandatory			Explanation

BLASTPROGRAM	   Yes 		Indicates whether to perform a 
				nucleic acid (BLASTN)
				or protein (BLASTP) search. 

DATALIB		   Yes		This line specifies the database to be 
				searched (see section below under
				"Sending the Query sequence") and must
				be included in the message.  

MATCH		   No		Scoring value for a match ( applicable for
				BLASTN only ).  BLASTP uses the PAM120
				scoring matrix.

BEGIN		   Yes		This line must be included in the message. 
				No other information is typed on it.


The remainder of the message contains the query sequence in FASTA
format (described below; a complete sample query is also provided).

*NOTE*: all lines must be LESS THAN 80 characters in length; longer
lines will be truncated.

Only one query sequence is allowed per mail message and your sequence
must be in FASTA format.  IntelliGenetics format and GenBank database
file format are not currently accepted; however, it is possible to use
an editor to change the file to FASTA format.  The format includes a
mandatory comment line beginning with a greater-than sign ">" followed
by the name of the sequence, a space, and an optional note about the
sequence.  The sequence data begin on the next line without the
greater-than sign.  For example:


>AGREP4 Monkey SV40-like genomic segment promoting transcription.
ccccttcaaatctattacaaggtgagcgtctcgccaaggcaatgaaatcgcaatatgatg 
tttccatttactttggattatacgtcattataaa



Sending the Query Sequence

Use your local mail program to send GenBank your query sequence.  Most
mail programs allow you to import a file containing your sequence into
the mail message.  You should import your sequence file into the mail
message on the line after "BEGIN".  Please follow the format in the
following example of a BLAST request PRECISELY, but note that the
program is case-insensitive, i.e.  either upper or lower case letters
may be used.


BLAST MAIL QUERY EXAMPLE

Note that the first four lines in the example below are a mail header
that is automatically created when you address a mail message.
Nothing need be entered for the Subject.  NOTE: the text that you
enter into the body of the message begins with the "BLASTPROGRAM"
keyword below (do not add blank lines in the message).  Each line of
information must be less than 80 characters in length.  Longer lines
will be truncated.


From:  drgene at someaddress.somewhere.edu Tue Jun 14 21:36:38 1988
Date:  14 Jun 1988 2129:02-PDT
To:    BLAST at GENBANK.BIO.NET  
Subject:  

BLASTPROGRAM blastn
DATALIB genbank
BEGIN
>BOVPRL GenBank entry BOVPRL from gbmam file.907 nucleotides. 
tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat
caccaccatggacagcaaa


The example above uses the three mandatory keyword lines:
BLASTPROGRAM, DATALIB, and BEGIN.  The MATCH line can be used between
the DATALIB and BEGIN lines when using the blastn program (see above).
See above for a list of choices for the DATALIB line.

The completed mail message is then sent to the BLAST Server at
GenBank.  Once your message is received, it is placed in a batch queue
and processed in the order it is received.  If you would like to know
the status of the queues being processed, you can send a mail message
to the BLAST Server address (BLAST at GENBANK.BIO.NET) containing the
word "QUEUE" on a single line of the mail message (Leave the Subject
field blank).  The BLASTP queue is labeled with the letter "h"; the
BLASTN queue is labeled with "i", e.g.,

 Rank     Execution Date     Owner     Job #   Queue   Job Name
  1st   Aug 22, 1991 16:33   kristoff    446       h   kristoff at genbank.bio.n
  1st   Aug 22, 1991 16:33   kristoff    447       i   kristoff at genbank.bio.n


Multiple jobs are currently permitted in the queues, but please limit
your zeal since others also use the service.  For example, submitting
ten jobs simultaneously would definitely be in bad taste.  We would
prefer it if, after submitting 2 - 4 jobs to the queues, you wait
until your results are received before submitting additional runs.  If
these conventions are repeatedly violated we will be forced to
implement automatic limitations on the queue as we have for the FASTA
"e" queue.


Handling the Results of a BLAST Search

When the results are returned, use your local mail program to view
them.  You can transfer the results of a BLAST search to a separate
disk file to free up space in your mail directory.  Consult the
documentation for your local mail program for the commands to read
and transfer mail.


Interpreting the Results of a BLAST Search

Please consult the BLAST manual section (appended to this file) for
complete details concerning result descriptions.


How to query the BLAST server queue

Please note that e-mail retrieval server requests are placed in a
queue for processing. Thus it may take a couple of minutes to get your
entries back if many people have submitted requests at the same time.
This queuing provides efficient scheduling of resources.  To find out
what requests are queued, send the word "QUEUE" to
BLAST at GENBANK.BIO.NET.


Retrieving individual entries found in BLAST searches

Database entries can be retrieved by either locus name or accession
number.  To use the GenBank Retrieval System, send an electronic
message to RETRIEVE at GENBANK.BIO.NET containing as text (leave the
Subject: line blank) either accession numbers (one per line) and/or
entry names (one per line).  Multiple entries may be submitted in a
single message using the following format:

CHKTUBA
BNACYP
J02852
J02855

Each sequence found will be returned in a separate mail message.

The data banks are searched in the order: GenBank New Data, GenBank
current release, EMBL New Data, EMBL current release, GenPept New
Data, GenPept current release, and Swiss-Prot until a match is found.
If an entry exists in both GenBank and EMBL with the same accession
number (the usual case), a query on the accession number will return
the GenBank version of the entry.  If the EMBL-format version is
required, it can be retrieved from the file server at
NETSERV at EMBL-Heidelberg.DE (for instructions send a message containing
the line HELP to that address).  To retrieve GenPept entries, use the
LOCUS name of the corresponding GenBank entry followed by a _1, or _n
where n represents the nth coding region in that GenBank entry.  For
example, ASNTUBBA_1 is the GenPept LOCUS name for the translation of
the first coding region from GenBank entry ASNTUBBA.

Please note that e-mail retrieval server requests are now placed in a
queue for processing versus being handled immediately as was the case
in the past.  Thus it may take a couple of minutes longer now to get
your entries back if many people have submitted requests at the same
time.  This inconvenience was necessary due to the increasing
popularity of the service.  All batch queues on GOS may be monitored
by sending the word QUEUE to SEARCH at GENBANK.BIO.NET.  Retrieval
requests are entered into the "g" queue.


	     IF YOU DO NOT FIND YOUR ENTRY ON THE SERVER:

Authors often request that data be held in confidence until after
publication even though they have already been assigned an accession
number for their data.  This adds an additional delay in data release
because the databank staff must ascertain that the data has appeared
in print.  If you have a reference to sequence data but can not
retrieve the data from the e-mail server, please send the literature
reference and accession number (or locus name) to
UPDATE at GENOME.LANL.GOV and the data will be released to the server as
soon as verification of publication is made.


			   DATA SUBMISSION

An electronic version of the sequence data submission form used by the
sequence data banks is also available through the RETRIEVE server.  To
receive a copy, send a message containing the word DATASUB as the only
line.  Instructions for completing and submitting the form are
included.  We would appreciate it if you would use this form only if
you can not use our free Authorin data submission software for the IBM
PC and Macintosh.  Copies of Authorin may be requested by sending
e-mail to authorin at genbank.bio.net.

If you have any questions or comments, feel free to mail them to
RETRIEVE-REQUEST at GENBANK.BIO.NET.


Obtaining BLAST 

BLAST is available by anonymous ftp from ncbi.nlm.nih.gov
[130.14.20.1] in the pub/blast directory.


End of BLAST Server Help

(BLAST man page omitted from here)



More information about the Bioforum mailing list