Okay, this is a beta test. Please don't spaz out! This information
is from the Kabat database and must be referred to as such. The
search is performed by Seqhunt, unpublished results. Contact
george at immuno.esam.nwu.edu ONLY. Behave yourselves, or the beta
will fail!
Regards,
George
Searching the Kabat Database of Sequences of Proteins of Immunological
Interest with Seqhunt.
Description
===========================================================================
Seqhunt is a set of routines we use here to search and analyze the Kabat
database. Recent modifications to the seqhunt set of routines has allowed
us to offer searches of the database to others through the electronic mail.
Most of the database is accessible for searching through the electronic
mail implementation. Pseudogenes, D-minigenes, and J-minigenes are
currently not accessible for searches.
**IMPORTANT**
Seqhunt is NOT an alignment program. The sequences in the database that
are aligned are done so by visual inspection; the alignment is forced
using the Kabat numbering system. If your request finds matches, the
sequences that come back have been pre-aligned. You may use these
returned aligned sequences as SUGGESTIONS as to how you might align your
sequence. For example, alignment of the third region of complementarity
for the heavy chains depends on finding where the D minigene region ends
and the J minigene region begins. This is not always possible, and
single base replacements in the rearranged sequence compound the problem.
So, if the codon or amino acid does not match perfectly, where does it
go-- with the D or J? That is a problem you have to work out visually.
Your results then might come back with many different aligned matches
which may be used as aids in the alignment process. Seqhunt was
originally written for this purpose.
Obtaining access to Seqhunt (IMPORTANT)
===========================================================================
Seqhunt is available without charge to researchers and others interested
in locating information in the Kabat database. Because each search
requires computing time, we ask that you follow the following guidelines:
1. Please send us your name and all electronic mail addresses which you
might use to send requests. We need all the addresses for people who
work with more than one machine or account.
2. Please limit the number of requests you send to 10 a night. If you
wish to send more requests, please get in touch with us personally
so that we can perform the searches for you (we can do it quicker).
Only those people sending in their name and electronic mail address will
have access to seqhunt. A file will be kept of all electronic mail
addresses we receive, and checked before a request is processed. If you
do not send in a valid electronic mail address, the request will not be
processed.
When we have put in your electronic mail address, we will notify you.
After that, you may begin sending your requests.
Types of Searches
===========================================================================
There are 8 search types allowed through the electronic mail. They are
as follows:
nsa : Nucleotide String Antibody
Search for a pattern match with the desired antibody specificity
for immunoglobulins or classification for TCR.
asa : Amino Acid String Antibody
Search for a pattern match with the desired antibody specificity
for immunoglobulins or classification for TCR.
nsr : Nucleotide String Reference
Search for a pattern match with the desired reference.
asr : Amino Acid String Reference
Search for a pattern match with the desired reference.
nsn : Nucleotide String Name
Search for a pattern match with the desired name.
asn : Amino Acid String Name
Search for a pattern match with the desired name.
All the above searches do not look for exact pattern matches. For
example, if you enter HIV for the search field in an nsa match, all
sequences containing the phrase HIV in the antibody specificity will
be returned.
nm : Nucleotide match
Search for the pattern matches with the target sequence you
supply, with no more than the allowable mismatches you specify.
Both senses of the sequence you send will be searched
automatically.
am : Amino acid match
Search for the pattern matches with the target sequence you
supply, with not more than the allowable mismatches you
specify. YOUR SEQUENCE MUST BE SENT IN SINGLE LETTER CODE.
Restrictions
===========================================================================
To allow restrictions to the searches, the following fields may be tailored
to your specifications. See the "valid restrictions" part of this document
for abbreviations used.
species human, mouse, rabbit, etc. or all
class immunoglobulin, t-cell receptor, mhc, etc. or all
subclass heavy chains, kappa light chains, tcr alpha, etc. or all
In addition to specifying the species, class and subclass, you may,
when allowed (see valid restrictions at the end), search "all" of a field.
For example, you may search {mouse, ig, all} meaning mouse immunoglobulin
heavy, kappa, and lambda chains. Another example would be searching
{all, ig, hc} meaning search all immunoglobulin heavy chains, regardless
of the species. Each field can use the restriction all. One case is not
allowed. You may not specify "all" for all three fields. This would mean
searching all species, all classes, and all subclasses for a match.
At the end of this file are the current allowable restrictions for each
field.
Formatting A Request (IMPORTANT)
===========================================================================
To keep things running as smoothly as possible, there is one format
developed for requests. It might be a good idea to keep a copy of
this for quick reference. Any format deviating from this format will
be discarded. You can put as many requests in a mail message as you
want as long as they are of the correct format.
Format of an E-mail search request of the Kabat database
--------------------------------------------------------
Form Comment
---- -----------------------------
$Begin begin of request
# comment optional one line comment
E-mail address your return e-mail address
Search type nsa,nsr,nsn,nm,asa,asr,asn,am
Species valid species or all
Class valid class or all
Subclass valid subclass or all
Mismatches mismatches allowed for nm,am
Search Pattern pattern to look for in search
$End end of request
All fields in the form must be filled, except the comment field which
is optional.
The $ before begin and end are there for a reason! Please don't
forget to put them in. The $ before end is there so that the routine
can differentiate between 'end' being amino acids and 'end' meaning
"the end".
Here is an example of an e-mail message from someone wanting to look
for the pattern HIV in the antibody specificities of amino acid data.
The restrictions imposed are to look through only human immunoglobulins.
Since the request wants to look through all immunoglobulins (heavy
chains, kappa light chains, lambda light chains), the subclass field
will be "all". The symbol for immunoglobulin is ig. These symbols
can be found at the end of this file. Since this request is not for a
sequence matching search, the mismatches field is not required. To
keep the format though, an "X" will be put in (the point here is
to fill the field with something). Of course remember that the
mismatch field would be important if we were doing a sequence match
(the next example). In this example, the comment line is filled in
with any relevant information you want to associate with the search.
Example request of asa (amino acid string antibody search)
----------------------------------------------------------
$Begin
tt at immuno.esam.nwu.edu return address
# hiv antibodies optional one line comment
asa search type
human species
ig class
all subclass
X mismatches (not used)
HIV target pattern
$End
This next example is a nucleotide sequence match over the mouse ig
kappa chains only, allowing 4 mismatches. The nucleotide sequence
should be free of characters other than atcg. Dashes, periods and
spaces will be removed. You can put n's in for unknown bases or
something else, but make sure it won't be removed.
Example request of nm (nucleotide match)
----------------------------------------
$Begin
tt at immuno.esam.nwu.edu return address
nm search type
mouse species
ig ig class
kappa kappa subclass
4 4 mismatches allowed
tggcccgctagcgcgcgatatatagcg target pattern
$End
In the above example, the target pattern can be much longer of course.
Some mailers only allow 80 characters to be put on a line, so that is
why the target pattern is right at the end. You can put the target
pattern on as many lines as you want. The routine will read each line
and glue them together (taking out spaces, dashes and carriage returns).
You can also put the sequence on one continuous line that wraps around
if you want. Just make sure there is nothing that is not sequence
between your target sequence and the statement "$End".
For amino acid sequence searches, the sequence sent should be in
SINGLE LETTER CODE.
Sending the Request
===========================================================================
Once the form is complete, send off the request to:
seqhunt at immuno.esam.nwu.edu
You should leave the Subject: line blank.
Processing the Request
===========================================================================
Because we are a small operation, we do not have the ability to process
requests right when they are received. To invoke seqhunt, a session of
the PL/Prophet environment must be loaded. Prophet is a memory hog and
seqhunt, when added on to it, tremendously slows down our other daily tasks
(like putting in the new sequences to search). Anyway, here is how it will
go until we get a big way-cool fast computer. The requests will be logged
throughout the night and day on day