Searching the Kabat Database of Sequences of Proteins of Immunological
Interest with Seqhunt.
Description
===========================================================================
Seqhunt is a set of routines we use here to search and analyze the Kabat
database. Recent modifications to the seqhunt set of routines has allowed
us to offer searches of the database to others through the electronic mail.
Most of the database is accessible for searching through the electronic
mail implementation. Pseudogenes, D-minigenes, and J-minigenes are
currently not accessible for searches.
**IMPORTANT**
Seqhunt is NOT an alignment program. The sequences in the database that
are aligned are done so by visual inspection; the alignment is forced
using the Kabat numbering system. If your request finds matches, the
sequences that come back have been pre-aligned. You may use these
returned aligned sequences as SUGGESTIONS as to how you might align your
sequence. For example, alignment of the third region of complementarity
for the heavy chains depends on finding where the D minigene region ends
and the J minigene region begins. This is not always possible, and
single base replacements in the rearranged sequence compound the problem.
So, if the codon or amino acid does not match perfectly, where does it
go-- with the D or J? That is a problem you have to work out visually.
Your results then might come back with many different aligned matches
which may be used as aids in the alignment process. Seqhunt was
originally written for this purpose.
Types of Searches
===========================================================================
There are 8 search types allowed through the electronic mail. They are
as follows:
nsa : Nucleotide String Antibody
Search for a pattern match with the desired antibody specificity
for immunoglobulins or classification for TCR.
asa : Amino Acid String Antibody
Search for a pattern match with the desired antibody specificity
for immunoglobulins or classification for TCR.
nsr : Nucleotide String Reference
Search for a pattern match with the desired reference.
asr : Amino Acid String Reference
Search for a pattern match with the desired reference.
nsn : Nucleotide String Name
Search for a pattern match with the desired name.
asn : Amino Acid String Name
Search for a pattern match with the desired name.
All the above searches do not look for exact pattern matches. For
example, if you enter HIV for the search field in an nsa match, all
sequences containing the phrase HIV in the antibody specificity will
be returned.
nm : Nucleotide match
Search for the pattern matches with the target sequence you
supply, with no more than the allowable mismatches you specify.
Both senses of the sequence you send will be searched
automatically.
am : Amino acid match
Search for the pattern matches with the target sequence you
supply, with not more than the allowable mismatches you
specify. YOUR SEQUENCE MUST BE SENT IN SINGLE LETTER CODE.
Restrictions
===========================================================================
To allow restrictions to the searches, the following fields may be tailored
to your specifications. See the "valid restrictions" part of this document
for abbreviations used.
species human, mouse, rabbit, etc. or all
class immunoglobulin, t-cell receptor, mhc, etc. or all
subclass heavy chains, kappa light chains, tcr alpha, etc. or all
In addition to specifying the species, class and subclass, you may,
when allowed (see valid restrictions at the end), search "all" of a field.
For example, you may search {mouse, ig, all} meaning mouse immunoglobulin
heavy, kappa, and lambda chains. Another example would be searching
{all, ig, hc} meaning search all immunoglobulin heavy chains, regardless
of the species. Each field can use the restriction all. One case is not
allowed. You may not specify "all" for all three fields. This would mean
searching all species, all classes, and all subclasses for a match.
At the end of this file are the current allowable restrictions for each
field.
Formatting A Request (IMPORTANT)
===========================================================================
To keep things running as smoothly as possible, there is one format
developed for requests. It might be a good idea to keep a copy of
this for quick reference. Any format deviating from this format will
be discarded. You can put as many requests in a mail message as you
want as long as they are of the correct format.
Format of an E-mail search request of the Kabat database
--------------------------------------------------------
Form Comment
---- -----------------------------
$Begin begin of request
# comment optional one line comment
E-mail address your return e-mail address
Search type nsa,nsr,nsn,nm,asa,asr,asn,am
Species valid species or all
Class valid class or all
Subclass valid subclass or all
Mismatches mismatches allowed for nm,am
Search Pattern pattern to look for in search
$End end of request
All fields in the form must be filled, except the comment field which
is optional.
The $ before begin and end are there for a reason! Please don't
forget to put them in. The $ before end is there so that the routine
can differentiate between 'end' being amino acids and 'end' meaning
"the end".
Here is an example of an e-mail message from someone wanting to look
for the pattern HIV in the antibody specificities of amino acid data.
The restrictions imposed are to look through only human immunoglobulins.
Since the request wants to look through all immunoglobulins (heavy
chains, kappa light chains, lambda light chains), the subclass field
will be "all". The symbol for immunoglobulin is ig. These symbols
can be found at the end of this file. Since this request is not for a
sequence matching search, the mismatches field is not required. To
keep the format though, an "X" will be put in (the point here is
to fill the field with something). Of course remember that the
mismatch field would be important if we were doing a sequence match
(the next example). In this example, the comment line is filled in
with any relevant information you want to associate with the search.
Example request of asa (amino acid string antibody search)
----------------------------------------------------------
$Begin
tt at immuno.esam.nwu.edu return address
# hiv antibodies optional one line comment
asa search type
human species
ig class
all subclass
X mismatches (not used)
HIV target pattern
$End
This next example is a nucleotide sequence match over the mouse ig
kappa chains only, allowing 4 mismatches. The nucleotide sequence
should be free of characters other than atcg. Dashes, periods and
spaces will be removed. You can put n's in for unknown bases or
something else, but make sure it won't be removed.
Example request of nm (nucleotide match)
----------------------------------------
$Begin
tt at immuno.esam.nwu.edu return address
nm search type
mouse species
ig ig class
kappa kappa subclass
4 4 mismatches allowe