IUBio

Searching the Kabat Database README

George Johnson george at immuno.esam.nwu.edu
Fri Nov 19 12:05:18 EST 1993


Searching the Kabat Database of Sequences of Proteins of Immunological 
Interest with Seqhunt.


Description
===========================================================================

Seqhunt is a set of routines we use here to search and analyze the Kabat 
database.  Recent modifications to the seqhunt set of routines has allowed 
us to offer searches of the database to others through the electronic mail.

Most of the database is accessible for searching through the electronic 
mail implementation.  Pseudogenes, D-minigenes, and J-minigenes are 
currently not accessible for searches.

**IMPORTANT**

Seqhunt is NOT an alignment program.  The sequences in the database that
are aligned are done so by visual inspection; the alignment is forced
using the Kabat numbering system.  If your request finds matches, the
sequences that come back have been pre-aligned.  You may use these
returned aligned sequences as SUGGESTIONS as to how you might align your
sequence.  For example, alignment of the third region of complementarity
for the heavy chains depends on finding where the D minigene region ends
and the J minigene region begins.  This is not always possible, and 
single base replacements in the rearranged sequence compound the problem.
So, if the codon or amino acid does not match perfectly, where does it
go-- with the D or J?  That is a problem you have to work out visually.
Your results then might come back with many different aligned matches
which may be used as aids in the alignment process.  Seqhunt was
originally written for this purpose.


Types of Searches
===========================================================================

There are 8 search types allowed through the electronic mail.  They are 
as follows:

nsa :  Nucleotide String Antibody
       Search for a pattern match with the desired antibody specificity 
       for immunoglobulins or classification for TCR.

asa :  Amino Acid String Antibody
       Search for a pattern match with the desired antibody specificity
       for immunoglobulins or classification for TCR.

nsr :  Nucleotide String Reference
       Search for a pattern match with the desired reference.

asr :  Amino Acid String Reference
       Search for a pattern match with the desired reference.

nsn :  Nucleotide String Name
       Search for a pattern match with the desired name.

asn :  Amino Acid String Name
       Search for a pattern match with the desired name.

All the above searches do not look for exact pattern matches.  For 
example, if you enter HIV for the search field in an nsa match, all 
sequences containing the phrase HIV in the antibody specificity will 
be returned.

nm  :  Nucleotide match
       Search for the pattern matches with the target sequence you 
       supply, with no more than the allowable mismatches you specify.
       Both senses of the sequence you send will be searched 
       automatically.

am  :  Amino acid match
       Search for the pattern matches with the target sequence you
       supply, with not more than the allowable mismatches you
       specify.  YOUR SEQUENCE MUST BE SENT IN SINGLE LETTER CODE.


Restrictions
===========================================================================

To allow restrictions to the searches, the following fields may be tailored 
to your specifications.  See the "valid restrictions" part of this document 
for abbreviations used.

species   human, mouse, rabbit, etc.  or all
class     immunoglobulin, t-cell receptor, mhc, etc. or all
subclass  heavy chains, kappa light chains, tcr alpha, etc. or all

In addition to specifying the species, class and subclass, you may,
when allowed (see valid restrictions at the end), search "all" of a field.
For example, you may search {mouse, ig, all} meaning mouse immunoglobulin
heavy, kappa, and lambda chains.  Another example would be searching
{all, ig, hc} meaning search all immunoglobulin heavy chains, regardless
of the species.  Each field can use the restriction all.  One case is not
allowed.  You may not specify "all" for all three fields.  This would mean
searching all species, all classes, and all subclasses for a match.

At the end of this file are the current allowable restrictions for each 
field.


Formatting A Request (IMPORTANT)
===========================================================================

To keep things running as smoothly as possible, there is one format 
developed for requests.  It might be a good idea to keep a copy of 
this for quick reference.  Any format deviating from this format will 
be discarded.  You can put as many requests in a mail message as you
want as long as they are of the correct format.


Format of an E-mail search request of the Kabat database
--------------------------------------------------------

Form                       Comment
----                       -----------------------------

$Begin                     begin of request
# comment                  optional one line comment
E-mail address             your return e-mail address
Search type                nsa,nsr,nsn,nm,asa,asr,asn,am
Species                    valid species or all
Class                      valid class or all
Subclass                   valid subclass or all
Mismatches                 mismatches allowed for nm,am
Search Pattern             pattern to look for in search
$End                       end of request

All fields in the form must be filled, except the comment field which 
is optional.

The $ before begin and end are there for a reason!  Please don't 
forget to put them in.  The $ before end is there so that the routine
can differentiate between 'end' being amino acids and 'end' meaning
"the end".

Here is an example of an e-mail message from someone wanting to look 
for the pattern HIV in the antibody specificities of amino acid data.
The restrictions imposed are to look through only human immunoglobulins.
Since the request wants to look through all immunoglobulins (heavy 
chains, kappa light chains, lambda light chains), the subclass field 
will be "all".  The symbol for immunoglobulin is ig.  These symbols
can be found at the end of this file.  Since this request is not for a 
sequence matching search, the mismatches field is not required.  To 
keep the format though, an "X" will be put in (the point here is
to fill the field with something).  Of course remember that the 
mismatch field would be important if we were doing a sequence match 
(the next example).  In this example, the comment line is filled in
with any relevant information you want to associate with the search.

Example request of asa (amino acid string antibody search)
----------------------------------------------------------

$Begin
tt at immuno.esam.nwu.edu         return address
# hiv antibodies               optional one line comment
asa                            search type
human                          species
ig                             class
all                            subclass
X                              mismatches (not used)
HIV                            target pattern
$End


This next example is a nucleotide sequence match over the mouse ig
kappa chains only, allowing 4 mismatches.  The nucleotide sequence
should be free of characters other than atcg.  Dashes, periods and
spaces will be removed.  You can put n's in for unknown bases or
something else, but make sure it won't be removed.

Example request of nm (nucleotide match)
----------------------------------------

$Begin
tt at immuno.esam.nwu.edu         return address
nm                             search type
mouse                          species
ig                             ig class
kappa                          kappa subclass
4                              4 mismatches allowed
tggcccgctagcgcgcgatatatagcg    target pattern
$End

In the above example, the target pattern can be much longer of course.
Some mailers only allow 80 characters to be put on a line, so that is
why the target pattern is right at the end.  You can put the target
pattern on as many lines as you want.  The routine will read each line
and glue them together (taking out spaces, dashes and carriage returns).
You can also put the sequence on one continuous line that wraps around
if you want.  Just make sure there is nothing that is not sequence 
between your target sequence and the statement "$End".

For amino acid sequence searches, the sequence sent should be in 
SINGLE LETTER CODE.


Sending the Request
===========================================================================


Once the form is complete, send off the request to:

seqhunt at immuno.esam.nwu.edu

You should leave the Subject: line blank.


Processing the Request
===========================================================================

Your request will be processed when it is received, and the results will
be send back as soon as the search is performed.


Results of Your Request
===========================================================================

Your request will come back with a header, the date processed, a summary
of the request submitted, and any matches that were found.  If only the
header and request summary come back, then either no matches were found
or the format of the request was not correct.  Below is the partial output
for an amino acid match search with the restrictions mouse, ig, lambda.
5 mismatches were allowed.  Note that although the request was sent in
single letter code, it is converted in the output to triplet code.

Each entry is divided by ~~~~~~.  Each entry has the following format.


NAME:   name of sequence
SEQ :   codon or amino acid sequence (with alignment information)
DIFF:   mismatches found (for nuc and a.a. matching)
BEG :   match beginning position (Kabat's numbering)
END :   match ending position (Kabat's numbering)
ANTI:   antibody specificity(s)
REF :   sequence reference(s)
TAB :   Kabat table sequence is located in
~~~~~~~ end of match


Request Sent (with a comment)
-----------------------------

To: seqhunt at immuno.esam.nwu.edu
Subject: 

$Begin
#part of a mouse lambda
tt at immuno.esam.nwu.edu
am
mouse
ig
lambda
5
QAVVTQESALTTSPGGTVILTCRSSTGAVTTSNYANWVQEKPDHLFTGLIGGTSNRAPGVPVRFSGSLIGD
KAALTITGAQTEDDAMYFCALWYSTH
$End


Results returned from the search
--------------------------------

Seqhunt results                              
================================================================================

Your seqhunt results are in either two or three parts.  There are THREE parts 
for nucleotide match (nm) and amino acid match (am); all other searches are in 
TWO parts.  The first two parts are the same for all searches:


    Part 1:  Summary of the request we received.
    Part 2:  Matches shown with alignment information.

And for nucleotide/amino acid matches:
                                                                           
    Part 3:  Matches shown unaligned.
                                                                           
Part 3 contains your search pattern on the top line followed by a listing of 
the matches found (in the same order as in part two), with a "." meaning a 
perfect match and anything else representing a mismatch.  At the end of each 
sequence in the figure, the name of the sequence is shown.   

If you have any questions or comments about the matches or the Seqhunt output,
please contact either:

George Johnson   george at immuno.esam.nwu.edu
Tai Te Wu        tt at immuno.esam.nwu.edu

New listings of allowable restrictions can be obtained from the above address 
or from ncbi.nlm.nih.gov in the directory  /repository/kabat in the file 
SEQHUNT_FIELDS.                
                                                                           
A complete set of instructions can be obtained by writing to us or by 
retrieving the file SEQHUNT_README also in the directory /repository/kabat.                            
================================================================================

DATE PROCESSED:  04/27/93


Request Received:

Search........:  Amino acid sequence match
Species.......:  mouse
Sequence class:  ig
Subclass......:  lambda
Mismatches....:  5
Your Comments.:  part of a mouse lambda
Search pattern:  qavvtqesalttspggtviltcrsstgavttsnyanwvqekpdhlftgliggtsnrapgvpvr
fsgsligdkaaltitgaqteddamyfcalwysth
Reverse Comp..:  Not used
=======

NAME:  E20'CL
SEQ :  GLN ALA VAL VAL THR GLN GLU SER ALA --- LEU THR THR SER PRO GLY GLY TH
       R VAL ILE LEU THR CYS ARG SER SER THR GLY ALA VAL --- --- --- THR THR 
       SER ASN TYR ALA ASN TRP VAL GLN GLU LYS PRO ASP HIS LEU PHE THR GLY LE
       U ILE GLY GLY THR SER ASN ARG ALA PRO GLY VAL PRO VAL ARG PHE SER GLY 
       SER LEU ILE GLY ASP LYS ALA ALA LEU THR ILE THR GLY ALA GLN THR GLU AS
       P ASP ALA MET TYR PHE CYS ALA LEU TRP TYR SER THR HIS 
DIFF:  0
BEG :  1
END :  95
ANTI:  ANTI-PHOSPHOCHOLINE PROTEIN, p-NITROPHENYL PHOSPHOCHOLINE
REF :  CHEN,C.,STENZEL-POORE,M.P. & RITTENBERG,M.B. (1991) J.IMMUNOL.,147,235
       9-2367.
TAB :  mouselambdalc
~~~~~~

NAME:  MOPC315
SEQ :  GLN ALA VAL VAL THR GLN GLU SER ALA --- LEU THR THR SER PRO GLY GLY TH
       R VAL ILE LEU THR CYS ARG SER SER THR GLY ALA VAL --- --- --- THR THR 
       SER ASN TYR ALA ASN TRP VAL GLN GLU LYS PRO ASP HIS LEU PHE THR GLY LE
       U ILE GLY GLY THR SER ASN ARG ALA PRO GLY VAL PRO VAL ARG PHE SER GLY 
       SER LEU ILE GLY ASP LYS ALA ALA LEU THR ILE THR GLY ALA GLN THR GLU AS
       P ASP ALA MET TYR PHE CYS ALA LEU TRP TYR SER THR HIS 
DIFF:  0
BEG :  1
END :  95
ANTI:  ANTI-DINITROPHENYL,TRINITROPHENYL,MENADIONE(VITAMIN K3)(BINDING CONSTA
       NT=5.4X10EXP5),EPSILON-DNP-L-LYS(BINDING CONSTANT=1.0X10EXP7), 2,4-DIN
       ITRONAPHTHOL(BINDING CONSTANT=2.5X10EXP6),EPSILON-DNP-AMINOCAPROATE(BI
       NDING CONSTANT=6.7X10EXP6)
REF :  DUGAN,E.S.,BRADSHAW,R.A.,SIMMS,E.S. & EISEN,H.N. (1973) BIOCHEMISTRY,1
       2,5400-5416. (CHECKED BY AUTHOR); BURSTEIN,Y. & SCHECHTER,I. (1977) BI
       OCHEM.J.,165,347-354; GAVISH,M.,ZAKUT,R.,WILCHEK,M. & GIVOL,D. (1978) 
       BIOCHEMISTRY,17,1345-1351.  (CHECKED BY AUTHOR 07/26/79)
TAB :  mouselambdalc
~~~~~~
 .
 .  
 . etc.
  
         [ I deleted some of the matches to save space ]

Unaligned matches
================================================================================

11 matches found.

         *         *         *         *         *         *         *         *
QAVVTQESALTTSPGGTVILTCRSSTGAVTTSNYANWVQEKPDHLFTGLIGGTSNRAPGVPVRFSGSLIGDKAALTITGA
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
d....................................i..........................................
.....................................i..........................................
..............r......................i..........................................
................................................................................
................................................................................

         *
QTEDDAMYFCALWYSTH
.................    E20'CL
.................    MOPC315
.................    MA8-13
.................    TEPC952
.................    W230'CL
.............frn.    1-54'CL
.............frn.    W108'CL
.............frn.    15-30'CL
.................    163.69'CL
.................    202.17'CL


In sequence matches with mismatches allowed, the mismatches will be shown 
in lower case for amino acids (as above), and in upper case for nucleotide
searches.  The aligned matches and unaligned matches are in the same order.
The first line of the unaligned matches is your target sequence.  The
sequences are compared with the target and exact matches are shown as "."
Anything else is a mismatch.  Sometimes, a space will be shown.  This is
considered a mismatch (since that base/amino acid) was not known or was 
not sequenced.

Help/Comments
===========================================================================

Requests for help and comments can be made to either Dr. Wu or myself.

 george at immuno.esam.nwu.edu
 tt at immuno.esam.nwu.edu

Please let us know if you receive wierd messages back, such as ones
saying Memory Fault or other things.  These messages mean the program
crashed while attempting to process the request.

The valid search fields table will be located and updated periodically 
as things change and deposited at ncbi.nlm.nih.gov in their anonymous ftp 
directory /repository/kabat in the file SEQHUNT_FIELDS.  If you cannot 
find it or don't know how to ftp, send me a message requesting one and I'll
send you a copy of the table. 

Also, please be aware that we have PostScript versions of the Kabat 
Database and dumps of the Kabat database in the above directory at ncbi.
These files are updated weekly.  The PostScript files can be printed on
any PostScript supporting printer.  For more information, poke around the
README files in /repository/kabat.

If you have a gopher client, you can gopher to ncbi.nlm.nih.gov and
change into repository kabat.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Valid Search Field Restrictions
===========================================================================

Here are the valid search fields as of 05/01/93.  These fields will most 
likely be around for a long time, with new additions every now and then. 
Be aware though that we might get the urge to shuffle things around, so
make sure the date on this table is not too old.

Here are the abbreviations.

Class
-----

ig       immunoglobulin
tcr      T-cell receptor for antigen
mhc      Major Histocompatibility Complex Class I
iregion  Major Histocompatibility Complex Class II
con      Constant Regions excluding ig heavy chain
chv      Immunoglobulin Heavy Chain constant regions
misc     Miscellaneous proteins associated with the immune system
ss       Signal sequences of all chains except miscellaneous sequences
miscss   Miscellaneous protein signal sequences


Subclass
--------

hc      immunoglobulin heavy chains
kappa   immunoglobulin kappa chains
lambda  immunoglobulin lambda chains
alpha   T-cell receptor for antigen alpha chains
beta    T-cell receptor for antigen beta chains
gamma   T-cell receptor for antigen gamma chains
delta   T-cell receptor for antigen delta chains
a       MHC class I A-locus
b       MHC class I B-locus
c       MHC class I C-locus
d       MHC class I D-locus
k       MHC class I K-locus
dpa     MHC class II DP alpha
dpb     MHC class II DP beta
dqa     MHC class II DQ alpha
dqb     MHC class II DQ beta
dra     MHC class II DR alpha
drb     MHC class II DR beta
aa      MHC class II A alpha
ab      MHC class II A beta
ea      MHC class II E alpha
eb      MHC class II E beta
adhe    Adhesion Proteins
b2mg    Beta-2-Microglobulin
comp    Complement
jch     J-chains
tsa     T-Cell Surface Antigens
thy     Thyone
miscp   Miscellaneous proteins


When you send a request to Seqhunt, the order for the restrictions is:

         species
         class
         subclass  

For ease in locating allowable restrictions, this listing is in a 
different order.  Make sure you put the restrictions in order though 
when you send the request.


Class                 Subclass                 Species
-----                 --------                 -------

ig                      hc                     mouse
                                               human
                                               cat
                                               chicken
                                               dog
                                               frog
                                               gopher
                                               rabbit
                                               rat
                                               shark
                                               various

ig                     kappa                   mouse
                                               human
                                               rabbit
                                               rat
                                               various

ig                    lambda                   human
                                               mouse
                                               chicken
                                               horse
                                               rabbit
                                               rat
                                               sheep
                                               various

tcr                   alpha                    human
                                               mouse
                                               bovine
                                               rabbit
                                               rat
                                               sheep
                                               various

tcr                   beta                     human
                                               mouse
                                               bovine
                                               chicken
                                               rabbit
                                               rat
                                               various

tcr                   delta                    human
                                               mouse
                                               rat
                                               sheep
                                               various

tcr                   gamma                    human
                                               mouse
                                               rat
                                               sheep
                                               various

mhc                    a                       human
mhc                    b                       human 
mhc                    c                       human
mhc                  various                   human
mhc                    d                       mouse
mhc                    k                       mouse
mhc                  various                   mouse
mhc                  various                   various

iregion               dpa                      human
iregion               dpb                      human
iregion               dqa                      human
iregion               dqb                      human
iregion               dra                      human
iregion               drb                      human
iregion             various *                  human *
iregion               aa                       mouse
iregion               ab                       mouse
iregion               ea                       mouse
iregion               eb                       mouse
iregion             various *                  various *

              * includes both alpha and beta chains

con                  alpha                     various
con                  beta                      various
con                  gamma                     various
con                  delta                     various
con                  kappa                     various
con                  lambda                    various

chv                  hc                        various

misc                 adhe                      various
misc                 b2mg                      various

misc                 comp                      human
misc                                           various

misc                 jch                       various
misc                 tsa                       various
misc                 thy                       various

misc                 miscp                     human
                                               mouse
                                               various

ss                   hc                        human 
                                               mouse
                                               various

ss                   kappa                     human
                                               mouse
                                               various

ss                   lambda                    human
                                               mouse
                                               various

ss                   alpha                     human
                                               mouse
                                               various

ss                   beta                      human
                                               mouse
                                               various

ss                   gamma                     human
                                               mouse
                                               various

ss                   delta                     human
                                               mouse
                                               various

miscss               mhc                       various
miscss               iregion                   various
miscss               adhe                      various
miscss               b2mg                      various
miscss               comp                      various
miscss               tsa                       various
miscss               miscp                     various



More information about the Immuno mailing list

Send comments to us at biosci-help [At] net.bio.net