IUBio

New Kabat Database Server

George Johnson george at immuno.bme.nwu.edu
Sun Jul 31 12:51:01 EST 1994


Just in time for Labor Day.  Makes a great gift!


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                                       +
+               The Kabat Database of Sequences of Proteins             +
+                         of Immunological Interest                     +
+                                                                       +
+              For help, questions or comments please write:            +
+                                                                       +
+              George Johnson      george at immuno.bme.nwu.edu            +
+              Tai Te Wu               tt at immuno.bme.nwu.edu            +
+                                                                       +
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


July 31, 1994, from George.


GENERAL STUFF
-------------

A new server is available at seqhunt2 at immuno.bme.nwu.edu.  Sending mail
to this address with the single word "help" in the message body will 
return this file.

This server is an improvement upon the seqhunt server running at
seqhunt at immuno.bme.nwu.edu.  

This server allows the e-mail user to interface with the complete 
database.  All sequence classes and all annotations are searchable and
returnable.  The query format is a simplification of the rather 
restricted format of the original seqhunt server.  Briefly, the server
allows you to make and/or/not constructed restrictions and allows 
nucleotide and amino acid pattern matching with differences allowed.
The dataset searched is a raw archive of the database, and thus contains
all sorts of things previously unsearchable through the other servers.

Requests are processed when they are received.  The average processing
time is about 2 minutes, depending on the complexity of the request.

The "hits" are returned in a different format than the original seqhunt
server.  This format is being applied to other distributions of the
database.  It is meant to be easier to read and easier to process by
computer programs.  The format contains a vertical alignment of the 
sequences returned, which is more familiar to users of the book.  In
all cases, the length of a line in the returned record is 80 characters
or less.


REPORTING PROBLEMS
------------------

In the unlikely (huh?) event that the server crashes, you will get back
two consecutive lines like:

Processed:  Sunday, July 31, 1994:  12:27:53 PM CDT
Server finished:  Sunday, July 31, 1994:  12:28:35 PM CDT

When there is nothing in between these lines, then something ran amuck.
If you can, please send me the request you sent in or the time you sent
the request and we will begin finding the bug.  Even if there are no 
matches with your request, you will get back something that says no
matches were found.


FORMAT OF A QUERY
-----------------

A query consists of two parts.  The two parts are separated by the word
"Begin" which is required in all requests.  Before the word begin, the
things you can put in are:

MAXDOC n           specify the maximum number of hits that will be 
                   returned.  The default is 20, the maximum is 75.

STARTDOC n         specifty the starting document to return.  For 
                   queries which have many hits, you may want to
                   return only 10 or so at a time.  To get the first
                   10, put in MAXDOCS 10.  To get the next 10, re-
                   submit the search and put in STARTDOC 11.

Both STARTDOC and MAXDOC are optional; they do NOT have to be included.

After the word Begin, one or two things can be specified.  These things
deal with restricting the search and doing pattern matches.  The first 
thing after the word begin should be the restriction.  The words in the
restriction are searched as regular expressions.  Phospho would match
phosphocholine, phosphoboo, phosphobobo, obophosphoagogo.  The regular
expression syntax can be used in these patterns.  The symbol -, though
is reserved for unary NOT.  Here are some examples:

mouse kappa light chains with phosphocholine specificity

The restriction would be:

     mouse and kappa and phosphocholine

More complicated requests can be made:

mouse or human phosphocholine antibodies

The restriction would be:
     
     (mouse or human) and phosphocholine

What about rat and rabbit antibodies, but no kappa's?

     (rat and rabbit) and -kappa

The -kappa means NOT kappa.  Note that the -'s must be distributed,
-(rat and rabbit) will not work, but (-rat and -rabbit) will work.

More examples will be described below.

After the restriction, which MUST occur all on one line, the pattern
matching tools are specified if you want to.  They are:

#NM n                            #NM is nucleotide match with n 
actgactagctacgtactgacgt          allowable mismatches

#AM n                            #AM is amino acid match with n 
AKSKSLWKSKALAKDKELWS             allowable mismatches


Note the sequence pattern goes on the line IMMEDIATELY following the
#AM or #AM line.  This is for a reason.  If you can only put 80 
characters on a line, you can split the sequence across multiple lines.
This is set up so that the search pattern is the LAST thing in the 
request, so everything after #AM or #NM line is search pattern.

The #AM and #NM are applied as the second part of an AND.  For example:

mouse kappa's that have cagtacgtcagtcagtca with 3 allowable mismatches

Begin
mouse and kappa
#NM 3
cagtacgtcagtcagtca

That's all for the request.  Mouse then Kappa are ANDed, then another
AND occurs with the pattern match.

For the amino acid and nucleotide matches, you do not have to specify
a restriction.  The default is a global search over the entire database.
Now, this might sound like the most prudent thing to do, but remember
that restricting things lowers the number of bases/amino acids the
program has to run through to find your matches.  Unfortunately, the
machine the server runs on is also a machine in heavy use by us.  So,
if at all possible, please restrict the amino acid and nucleotide 
searches if you can.  There are plenty of legitimate reasons to globally
search everything, but if you are only interested in mouse kappa's, just
include the line:
  mouse and kappa
and immediately you eliminate about 15000 sequences that have to be
searched through.


Examples:

Begin
rabbit and minigene

This search returned 72 hits applying the AND.  There are rabbit
minigenes in the output, but also instances of some guy named
RABBITT who sequenced a chromosomal abberation which brought a Vh and
TCR JA MINIGENE close together.  As you can see, the hits are not
always what you want.  That tells you to be very careful about how
you format your request.


MAXDOCS 5
Begin
human
#AM 0
FYME

This search returned 4 matches, one of which is interesting for you
FYME buffs.

   79    73    THR       
   80    74    SER       
   81    75    THR       
   82    76    SER       
   83    77    ILE       
   84    78    PHE  PHE  
   85    79    TYR  TYR  
   86    80    MET  MET  
   87    81    GLU  GLU  
   88    82    LEU       
   89   82A    SER       
   90   82B    ARG       
   91   82C    LEU       
   92    83    ARG       

That doesn't belong there!

To get a feeling of what of output is like, send in a request like this:

Begin
chicken

This request sends back some chicken sequences.  To be more daring, 
substitute your own favorite species!

---------


Please let me know what you like/don't like about the server.

George Johnson      george at immuno.bme.nwu.edu



More information about the Immuno mailing list

Send comments to us at biosci-help [At] net.bio.net