New Kabat Database Server
george at immuno.bme.nwu.edu
Sun Jul 31 12:51:01 EST 1994
Just in time for Labor Day. Makes a great gift!
+ The Kabat Database of Sequences of Proteins +
+ of Immunological Interest +
+ For help, questions or comments please write: +
+ George Johnson george at immuno.bme.nwu.edu +
+ Tai Te Wu tt at immuno.bme.nwu.edu +
July 31, 1994, from George.
A new server is available at seqhunt2 at immuno.bme.nwu.edu. Sending mail
to this address with the single word "help" in the message body will
return this file.
This server is an improvement upon the seqhunt server running at
seqhunt at immuno.bme.nwu.edu.
This server allows the e-mail user to interface with the complete
database. All sequence classes and all annotations are searchable and
returnable. The query format is a simplification of the rather
restricted format of the original seqhunt server. Briefly, the server
allows you to make and/or/not constructed restrictions and allows
nucleotide and amino acid pattern matching with differences allowed.
The dataset searched is a raw archive of the database, and thus contains
all sorts of things previously unsearchable through the other servers.
Requests are processed when they are received. The average processing
time is about 2 minutes, depending on the complexity of the request.
The "hits" are returned in a different format than the original seqhunt
server. This format is being applied to other distributions of the
database. It is meant to be easier to read and easier to process by
computer programs. The format contains a vertical alignment of the
sequences returned, which is more familiar to users of the book. In
all cases, the length of a line in the returned record is 80 characters
In the unlikely (huh?) event that the server crashes, you will get back
two consecutive lines like:
Processed: Sunday, July 31, 1994: 12:27:53 PM CDT
Server finished: Sunday, July 31, 1994: 12:28:35 PM CDT
When there is nothing in between these lines, then something ran amuck.
If you can, please send me the request you sent in or the time you sent
the request and we will begin finding the bug. Even if there are no
matches with your request, you will get back something that says no
matches were found.
FORMAT OF A QUERY
A query consists of two parts. The two parts are separated by the word
"Begin" which is required in all requests. Before the word begin, the
things you can put in are:
MAXDOC n specify the maximum number of hits that will be
returned. The default is 20, the maximum is 75.
STARTDOC n specifty the starting document to return. For
queries which have many hits, you may want to
return only 10 or so at a time. To get the first
10, put in MAXDOCS 10. To get the next 10, re-
submit the search and put in STARTDOC 11.
Both STARTDOC and MAXDOC are optional; they do NOT have to be included.
After the word Begin, one or two things can be specified. These things
deal with restricting the search and doing pattern matches. The first
thing after the word begin should be the restriction. The words in the
restriction are searched as regular expressions. Phospho would match
phosphocholine, phosphoboo, phosphobobo, obophosphoagogo. The regular
expression syntax can be used in these patterns. The symbol -, though
is reserved for unary NOT. Here are some examples:
mouse kappa light chains with phosphocholine specificity
The restriction would be:
mouse and kappa and phosphocholine
More complicated requests can be made:
mouse or human phosphocholine antibodies
The restriction would be:
(mouse or human) and phosphocholine
What about rat and rabbit antibodies, but no kappa's?
(rat and rabbit) and -kappa
The -kappa means NOT kappa. Note that the -'s must be distributed,
-(rat and rabbit) will not work, but (-rat and -rabbit) will work.
More examples will be described below.
After the restriction, which MUST occur all on one line, the pattern
matching tools are specified if you want to. They are:
#NM n #NM is nucleotide match with n
actgactagctacgtactgacgt allowable mismatches
#AM n #AM is amino acid match with n
AKSKSLWKSKALAKDKELWS allowable mismatches
Note the sequence pattern goes on the line IMMEDIATELY following the
#AM or #AM line. This is for a reason. If you can only put 80
characters on a line, you can split the sequence across multiple lines.
This is set up so that the search pattern is the LAST thing in the
request, so everything after #AM or #NM line is search pattern.
The #AM and #NM are applied as the second part of an AND. For example:
mouse kappa's that have cagtacgtcagtcagtca with 3 allowable mismatches
mouse and kappa
That's all for the request. Mouse then Kappa are ANDed, then another
AND occurs with the pattern match.
For the amino acid and nucleotide matches, you do not have to specify
a restriction. The default is a global search over the entire database.
Now, this might sound like the most prudent thing to do, but remember
that restricting things lowers the number of bases/amino acids the
program has to run through to find your matches. Unfortunately, the
machine the server runs on is also a machine in heavy use by us. So,
if at all possible, please restrict the amino acid and nucleotide
searches if you can. There are plenty of legitimate reasons to globally
search everything, but if you are only interested in mouse kappa's, just
include the line:
mouse and kappa
and immediately you eliminate about 15000 sequences that have to be
rabbit and minigene
This search returned 72 hits applying the AND. There are rabbit
minigenes in the output, but also instances of some guy named
RABBITT who sequenced a chromosomal abberation which brought a Vh and
TCR JA MINIGENE close together. As you can see, the hits are not
always what you want. That tells you to be very careful about how
you format your request.
This search returned 4 matches, one of which is interesting for you
79 73 THR
80 74 SER
81 75 THR
82 76 SER
83 77 ILE
84 78 PHE PHE
85 79 TYR TYR
86 80 MET MET
87 81 GLU GLU
88 82 LEU
89 82A SER
90 82B ARG
91 82C LEU
92 83 ARG
That doesn't belong there!
To get a feeling of what of output is like, send in a request like this:
This request sends back some chicken sequences. To be more daring,
substitute your own favorite species!
Please let me know what you like/don't like about the server.
George Johnson george at immuno.bme.nwu.edu
More information about the Immuno