What you've been waiting for (beta)

George Johnson geojohn at casbah.acns.nwu.edu
Sat Jun 5 23:45:24 EST 1993


Okay, this is a beta test.  Please don't spaz out!  This information
is from the Kabat database and must be referred to as such.  The
search is performed by Seqhunt, unpublished results.  Contact
george at immuno.esam.nwu.edu ONLY.  Behave yourselves, or the beta
will fail!

Regards,

George

Searching the Kabat Database of Sequences of Proteins of Immunological 
Interest with Seqhunt.


Description
===========================================================================

Seqhunt is a set of routines we use here to search and analyze the Kabat 
database.  Recent modifications to the seqhunt set of routines has allowed 
us to offer searches of the database to others through the electronic mail.

Most of the database is accessible for searching through the electronic 
mail implementation.  Pseudogenes, D-minigenes, and J-minigenes are 
currently not accessible for searches.

**IMPORTANT**

Seqhunt is NOT an alignment program.  The sequences in the database that
are aligned are done so by visual inspection; the alignment is forced
using the Kabat numbering system.  If your request finds matches, the
sequences that come back have been pre-aligned.  You may use these
returned aligned sequences as SUGGESTIONS as to how you might align your
sequence.  For example, alignment of the third region of complementarity
for the heavy chains depends on finding where the D minigene region ends
and the J minigene region begins.  This is not always possible, and 
single base replacements in the rearranged sequence compound the problem.
So, if the codon or amino acid does not match perfectly, where does it
go-- with the D or J?  That is a problem you have to work out visually.
Your results then might come back with many different aligned matches
which may be used as aids in the alignment process.  Seqhunt was
originally written for this purpose.


Obtaining access to Seqhunt (IMPORTANT)
===========================================================================

Seqhunt is available without charge to researchers and others interested 
in locating information in the Kabat database.  Because each search 
requires computing time, we ask that you follow the following guidelines:

1.  Please send us your name and all electronic mail addresses which you
    might use to send requests.  We need all the addresses for people who
    work with more than one machine or account.
   
2.  Please limit the number of requests you send to 10 a night.  If you
    wish to send more requests, please get in touch with us personally
    so that we can perform the searches for you (we can do it quicker).

Only those people sending in their name and electronic mail address will
have access to seqhunt.  A file will be kept of all electronic mail 
addresses we receive, and checked before a request is processed.  If you
do not send in a valid electronic mail address, the request will not be
processed.

When we have put in your electronic mail address, we will notify you.
After that, you may begin sending your requests.


Types of Searches
===========================================================================

There are 8 search types allowed through the electronic mail.  They are 
as follows:

nsa :  Nucleotide String Antibody
       Search for a pattern match with the desired antibody specificity 
       for immunoglobulins or classification for TCR.

asa :  Amino Acid String Antibody
       Search for a pattern match with the desired antibody specificity
       for immunoglobulins or classification for TCR.

nsr :  Nucleotide String Reference
       Search for a pattern match with the desired reference.

asr :  Amino Acid String Reference
       Search for a pattern match with the desired reference.

nsn :  Nucleotide String Name
       Search for a pattern match with the desired name.

asn :  Amino Acid String Name
       Search for a pattern match with the desired name.

All the above searches do not look for exact pattern matches.  For 
example, if you enter HIV for the search field in an nsa match, all 
sequences containing the phrase HIV in the antibody specificity will 
be returned.

nm  :  Nucleotide match
       Search for the pattern matches with the target sequence you 
       supply, with no more than the allowable mismatches you specify.
       Both senses of the sequence you send will be searched 
       automatically.

am  :  Amino acid match
       Search for the pattern matches with the target sequence you
       supply, with not more than the allowable mismatches you
       specify.  YOUR SEQUENCE MUST BE SENT IN SINGLE LETTER CODE.


Restrictions
===========================================================================

To allow restrictions to the searches, the following fields may be tailored 
to your specifications.  See the "valid restrictions" part of this document 
for abbreviations used.

species   human, mouse, rabbit, etc.  or all
class     immunoglobulin, t-cell receptor, mhc, etc. or all
subclass  heavy chains, kappa light chains, tcr alpha, etc. or all

In addition to specifying the species, class and subclass, you may,
when allowed (see valid restrictions at the end), search "all" of a field.
For example, you may search {mouse, ig, all} meaning mouse immunoglobulin
heavy, kappa, and lambda chains.  Another example would be searching
{all, ig, hc} meaning search all immunoglobulin heavy chains, regardless
of the species.  Each field can use the restriction all.  One case is not
allowed.  You may not specify "all" for all three fields.  This would mean
searching all species, all classes, and all subclasses for a match.

At the end of this file are the current allowable restrictions for each 
field.


Formatting A Request (IMPORTANT)
===========================================================================

To keep things running as smoothly as possible, there is one format 
developed for requests.  It might be a good idea to keep a copy of 
this for quick reference.  Any format deviating from this format will 
be discarded.  You can put as many requests in a mail message as you
want as long as they are of the correct format.


Format of an E-mail search request of the Kabat database
--------------------------------------------------------

Form                       Comment
----                       -----------------------------

$Begin                     begin of request
# comment                  optional one line comment
E-mail address             your return e-mail address
Search type                nsa,nsr,nsn,nm,asa,asr,asn,am
Species                    valid species or all
Class                      valid class or all
Subclass                   valid subclass or all
Mismatches                 mismatches allowed for nm,am
Search Pattern             pattern to look for in search
$End                       end of request

All fields in the form must be filled, except the comment field which 
is optional.

The $ before begin and end are there for a reason!  Please don't 
forget to put them in.  The $ before end is there so that the routine
can differentiate between 'end' being amino acids and 'end' meaning
"the end".

Here is an example of an e-mail message from someone wanting to look 
for the pattern HIV in the antibody specificities of amino acid data.
The restrictions imposed are to look through only human immunoglobulins.
Since the request wants to look through all immunoglobulins (heavy 
chains, kappa light chains, lambda light chains), the subclass field 
will be "all".  The symbol for immunoglobulin is ig.  These symbols
can be found at the end of this file.  Since this request is not for a 
sequence matching search, the mismatches field is not required.  To 
keep the format though, an "X" will be put in (the point here is
to fill the field with something).  Of course remember that the 
mismatch field would be important if we were doing a sequence match 
(the next example).  In this example, the comment line is filled in
with any relevant information you want to associate with the search.

Example request of asa (amino acid string antibody search)
----------------------------------------------------------

$Begin
tt at immuno.esam.nwu.edu         return address
# hiv antibodies               optional one line comment
asa                            search type
human                          species
ig                             class
all                            subclass
X                              mismatches (not used)
HIV                            target pattern
$End


This next example is a nucleotide sequence match over the mouse ig
kappa chains only, allowing 4 mismatches.  The nucleotide sequence
should be free of characters other than atcg.  Dashes, periods and
spaces will be removed.  You can put n's in for unknown bases or
something else, but make sure it won't be removed.

Example request of nm (nucleotide match)
----------------------------------------

$Begin
tt at immuno.esam.nwu.edu         return address
nm                             search type
mouse                          species
ig                             ig class
kappa                          kappa subclass
4                              4 mismatches allowed
tggcccgctagcgcgcgatatatagcg    target pattern
$End

In the above example, the target pattern can be much longer of course.
Some mailers only allow 80 characters to be put on a line, so that is
why the target pattern is right at the end.  You can put the target
pattern on as many lines as you want.  The routine will read each line
and glue them together (taking out spaces, dashes and carriage returns).
You can also put the sequence on one continuous line that wraps around
if you want.  Just make sure there is nothing that is not sequence 
between your target sequence and the statement "$End".

For amino acid sequence searches, the sequence sent should be in 
SINGLE LETTER CODE.


Sending the Request
===========================================================================


Once the form is complete, send off the request to:

seqhunt at immuno.esam.nwu.edu

You should leave the Subject: line blank.


Processing the Request
===========================================================================

Because we are a small operation, we do not have the ability to process 
requests right when they are received.  To invoke seqhunt, a session of 
the PL/Prophet environment must be loaded.  Prophet is a memory hog and 
seqhunt, when added on to it, tremendously slows down our other daily tasks 
(like putting in the new sequences to search).  Anyway, here is how it will 
go until we get a big way-cool fast computer.  The requests will be logged 
throughout the night and day on day


More information about the Immuno mailing list