___________ ___________ ___________
|\ __________\ |___________| /__________ /|
| | | | | | | |
| | **********| |***********| |********** | |
| | * BLOCKS | | E-MAIL | |SEARCHER * | |
| | **********| |***********| |********** | |
\|___________|___________|___________|___________|___________|/
|\ __________\ /__________ /|
| | | | Copyright | |
| | S Agus | | Fred | |
| |JG Henikoff| | Hutchinson| |
| | S Henikoff| | Center | |
\|___________| |____1993___|/
Release 6.0 of the Blocks Database is now available for searching. Blocks are
short multiply aligned ungapped segments corresponding to the most highly
conserved regions of proteins. A database of blocks has been constructed by
successive application of the automated PROTOMAT system to individual entries
in the PROSITE catalog of protein groups keyed to the SWISS-PROT protein
sequence databank. The rationale behind searching a database of blocks is
that information from multiply aligned sequences is present in a concentrated
form, reducing background and increasing sensitivity to distant
relationships. If a particular block scores highly, it is possible that the
sequence is related to the group of sequences the block represents.
Typically, a group of proteins has more than one region in common and their
relationship is represented as a series of blocks separated by unaligned
regions. If a second block for a group also scores highly in the search, the
evidence that the sequence is related to the group is strengthened, and is
further strengthened if a third block also scores it highly, and so on. The
new database consists of 2302 blocks based on 619 protein groups documented
in Prosite 10.00, which is keyed to Swiss-Prot 24. This represents an 11%
increase in the number of groups over the previous release based on Prosite
9.00 keyed to Swiss-Prot 22.
For a detailed help file, send a blank e-mail message as follows:
To: blocks at howard.fhcrc.org
Subject: help
Or just send a protein or DNA sequence in FASTA, Genepro, GenBank, EMBL,
SWISS-PROT, or PIR formats (DNA is automatically translated in all 6 reading
frames for searching). Here is an example of a protein query in FASTA format:
To: blocks at howard.fhcrc.org
Subject:
>YCZ2_YEAST Hypothetical 40.1 KD protein in HMR 3' region
MKAVVIEDGKAVVKEGVPIPELEEGFVLIKTLAVAGNPTDWAHIDYKVGPQGSILGCDAA
GQIVKLGPAVDPKDFSIGDYIYGFIHGSSVRFPSNGAFAEYSAISTVVAYKSPNELKFLG
EDVLPAGPVRSLEGAATIPVSLTTAGLVLTYNLGLNLKWEPSTPQRNGPILLWGGATAVG
QSLIQLANKLNGFTKIIVVASRKHEKLLKEYGADQLFDYHDIDVVEQIKHKYNNISYLVD
CVANQNTLQQVYKCAADKQDATVVELTNLTEENVKKENRRQNVTIDRTRLYSIGGHEVPF
GGITFPADPEARRAATEFVKFINPKISDGQIHHIPARVYKNGLYDVPRILEDIKIGKNSG
EKLVAVLN
For any group represented in the current database, the blocks and full
Prosite documentation can be obtained by sending the command 'GET BL?????'
in the subject line of a blank message. For example, 'get bl00044' retrieves
the block(s), PROSITE.DAT and PROSITE.DOC entries for Prosite group PS00044.
The Blocks database also has been used to construct amino acid substitution
matrices, referred to as the 'BLOSUM' (for BLOcks SUbstitution Matrices)
series. We have obtained improved results using the FASTA, BLASTP and other
programs with these matrices (see Henikoff, S & Henikoff, J.G. "Amino acid
substitution matrices from protein blocks" PNAS 89:10915-10919). The single
best overall matrix, BLOSUM 62, is recommended for general use. Here is
BLOSUM 62 formatted for use with BLAST:
------------------------------cut here---------------------------------------
# BLAST version of BLOSUM 62 matrix made from BLOCKS v. 5.0 and
# scaled in half-bits. B, Z and X columns are based on amino acid
# frequencies from SwissProt 22. * column uses minimum score.
A R N D C Q E G H I L K M F P S T W Y V B Z X *
4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4
-1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4
-2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4
-2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4
0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4
-1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4
-1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4
0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4
-2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4
-1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4
-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4
-1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4
-1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4
-2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4
-1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4
1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4
0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4
-3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4
-2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4
0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4
-2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 -4
-1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4
0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4
-4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1