New release of the Blocks Database for searching

Steven Henikoff henikoff at carson.u.washington.edu
Mon Jan 25 12:53:28 EST 1993


        ___________               ___________               ___________ 
       |\ __________\            |___________|            /__________ /|
       | |           |           |           |           |           | |
       | | **********|           |***********|           |********** | |
       | | * BLOCKS  |           |   E-MAIL  |           |SEARCHER * | |
       | | **********|           |***********|           |********** | |
        \|___________|___________|___________|___________|___________|/
                     |\ __________\         /__________ /|
                     | |           |       | Copyright | |
                     | |   S Agus  |       |    Fred   | |
                     | |JG Henikoff|       | Hutchinson| |
                     | | S Henikoff|       |   Center  | |
                      \|___________|       |____1993___|/

Release 6.0 of the Blocks Database is now available for searching. Blocks are 
short multiply aligned ungapped segments corresponding to the most highly 
conserved regions of proteins. A database of blocks has been constructed by 
successive application of the automated PROTOMAT system to individual entries
in the PROSITE catalog of protein groups keyed to the SWISS-PROT protein 
sequence databank. The rationale behind searching a database of blocks is 
that information from multiply aligned sequences is present in a concentrated
form, reducing background and increasing sensitivity to distant 
relationships. If a particular block scores highly, it is possible that the 
sequence is related to the group of sequences the block represents. 
Typically, a group of proteins has more than one region in common and their 
relationship is represented as a series of blocks separated by unaligned 
regions. If a second block for a group also scores highly in the search, the 
evidence that the sequence is related to the group is strengthened, and is 
further strengthened if a third block also scores it highly, and so on. The 
new database consists of 2302 blocks based on 619 protein groups documented 
in Prosite 10.00, which is keyed to Swiss-Prot 24. This represents an 11% 
increase in the number of groups over the previous release based on Prosite 
9.00 keyed to Swiss-Prot 22.

For a detailed help file, send a blank e-mail message as follows:

To: blocks at howard.fhcrc.org
Subject: help

Or just send a protein or DNA sequence in FASTA, Genepro, GenBank, EMBL,
SWISS-PROT, or PIR formats (DNA is automatically translated in all 6 reading
frames for searching). Here is an example of a protein query in FASTA format:

To: blocks at howard.fhcrc.org
Subject:
>YCZ2_YEAST   Hypothetical 40.1 KD protein in HMR 3' region
MKAVVIEDGKAVVKEGVPIPELEEGFVLIKTLAVAGNPTDWAHIDYKVGPQGSILGCDAA
GQIVKLGPAVDPKDFSIGDYIYGFIHGSSVRFPSNGAFAEYSAISTVVAYKSPNELKFLG
EDVLPAGPVRSLEGAATIPVSLTTAGLVLTYNLGLNLKWEPSTPQRNGPILLWGGATAVG
QSLIQLANKLNGFTKIIVVASRKHEKLLKEYGADQLFDYHDIDVVEQIKHKYNNISYLVD
CVANQNTLQQVYKCAADKQDATVVELTNLTEENVKKENRRQNVTIDRTRLYSIGGHEVPF
GGITFPADPEARRAATEFVKFINPKISDGQIHHIPARVYKNGLYDVPRILEDIKIGKNSG
EKLVAVLN

For any group represented in the current database, the blocks and full 
Prosite documentation can be obtained by sending the command 'GET BL?????' 
in the subject line of a blank message. For example, 'get bl00044' retrieves
the block(s), PROSITE.DAT and PROSITE.DOC entries for Prosite group PS00044.

The Blocks database also has been used to construct amino acid substitution
matrices, referred to as the 'BLOSUM' (for BLOcks SUbstitution Matrices)
series. We have obtained improved results using the FASTA, BLASTP and other
programs with these matrices (see Henikoff, S & Henikoff, J.G. "Amino acid
substitution matrices from protein blocks" PNAS 89:10915-10919). The single
best overall matrix, BLOSUM 62, is recommended for general use. Here is 
BLOSUM 62 formatted for use with BLAST:
------------------------------cut here---------------------------------------
#  BLAST version of BLOSUM 62 matrix made from BLOCKS v. 5.0 and 
#  scaled in half-bits. B, Z and X columns are based on amino acid 
#  frequencies from SwissProt 22. * column uses minimum score.
 A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *
 4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0 -4 
-1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1 -4 
-2  0  6  1 -3  0  0  0  1 -3 -3  0 -2 -3 -2  1  0 -4 -2 -3  3  0 -1 -4 
-2 -2  1  6 -3  0  2 -1 -1 -3 -4 -1 -3 -3 -1  0 -1 -4 -3 -3  4  1 -1 -4 
 0 -3 -3 -3  9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 
-1  1  0  0 -3  5  2 -2  0 -3 -2  1  0 -3 -1  0 -1 -2 -1 -2  0  3 -1 -4 
-1  0  0  2 -4  2  5 -2  0 -3 -3  1 -2 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 
 0 -2  0 -1 -3 -2 -2  6 -2 -4 -4 -2 -3 -3 -2  0 -2 -2 -3 -3 -1 -2 -1 -4 
-2  0  1 -1 -3  0  0 -2  8 -3 -3 -1 -2 -1 -2 -1 -2 -2  2 -3  0  0 -1 -4 
-1 -3 -3 -3 -1 -3 -3 -4 -3  4  2 -3  1  0 -3 -2 -1 -3 -1  3 -3 -3 -1 -4 
-1 -2 -3 -4 -1 -2 -3 -4 -3  2  4 -2  2  0 -3 -2 -1 -2 -1  1 -4 -3 -1 -4 
-1  2  0 -1 -3  1  1 -2 -1 -3 -2  5 -1 -3 -1  0 -1 -3 -2 -2  0  1 -1 -4 
-1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5  0 -2 -1 -1 -1 -1  1 -3 -1 -1 -4 
-2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6 -4 -2 -2  1  3 -1 -3 -3 -1 -4 
-1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7 -1 -1 -4 -3 -2 -2 -1 -2 -4 
 1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4  1 -3 -2 -2  0  0  0 -4 
 0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5 -2 -2  0 -1 -1  0 -4 
-3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11  2 -3 -4 -3 -2 -4 
-2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7 -1 -3 -2 -1 -4 
 0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4 -3 -2 -1 -4 
-2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1 -4 
-1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 
 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1 -4 
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1 



More information about the Bionews mailing list