MM MM BBBBBBB
MMM MMM BB B
MM M M MM AA TTTTTT CCCC HH HH BB B OOOOO XX
X
MM M M MM AAAA TT CC HH HH == BBBBBBB OO OO XX
X
MM M M MM AA AA TT CC HHHHHHH == BB B OO OO
XX
MM M M MM AAAAAAAA TT CC HH HH BB B OO OO X
XX
MM M MM AA AA TT CCCC HH HH BBBBBBB OOOOO X
XX
Match-Box Server 1.1 multiple sequence alignment server.
Guy Baudoux, Isabelle Reginster, Pascal Briffeuil,
Eric Depiereux and Ernest Feytmans.
Laboratory of Structural Molecular Biology, University of Namur,
Belgium
Rue de Bruxelles, 61 : Tel: +32-81-72-44-15
5000 Namur, Belgium : Fax: +32-81-72-44-15
E-mail: matchbox-help at biq.fundp.ac.be
Project supported by the Walloon Government,
Federal State of Belgium.
PRESENTATION:
We are pleased to announce the availability of a new sequence
alignment
server, based on the Match-Box software developped by Drs. Eric
Depiereux
and Ernest Feytmans (1,2).
The Match-Box software proposes protein sequence alignment tools based
on
strict statistical criteria. The method circumvents the gap penalty
requirement: in the Match-Box method the gaps are the result of the
alignment
and not a governing parameter of the matching procedure.
The method produces reliable results, as assessed by the tests
performed on
protein families of known structures and of low sequence similarity.
A reliability score is computed in relation with a threshold of
similarity
progressively raised to extend the aligned regions to their maximal
length.
The score obtained at each position of the final alignment is printed
below
the sequences and allows a discriminant reading of each aligned region.
Several additional outputs present pairwise similarity analyses in
order to
allow delineation of relevant subsets of related sequences and to avoid
alignment of unrelated sequences.
EXAMPLE:
The lysozyme and alpha-lactalbumin is a well-known family of enzymes
discussed in depth by H.A.McKenzie and F.H.White (3). The sequences of
the
lysozymes structures 1ALC, 1LZ1, 2LZ2 and 2LZT were aligned with the
Match-Box server that gives the following result:
Sequences number, length and name
_________________________________
1 122 1ALC 2 130 1LZ1 3 129 2LZ2 4 129
2LZT
10 20 30 40 50 60
70
+ + + + + +
+
1
kqftkcelsqnlyd--idgygrialpelictMFhtsgydtqai--vendesteyglfqisnalwckssqs
2
kvfercelartlkrLGmdgyrgislanwmclAKwesgyntratNYnagdrstdygifqinsrywcndgkt
3
kvygrcelaaamkrLGldnyrgyslgnwvcaAKfesnfnthatN-rntdgstdygilqinsrwwcndgrt
4
kvfgrcelaaamkrHGldnyrgyslgnwvcaAKfesnfntqatN-rntdgstdygilqinsrwwcndgrt
11111111111113 333333333444444 2222222222
1111111111111111111111111
80 90 100 110 120 130
140
+ + + + + +
+
1 pqsrnicditcdkflddditddimcakkild-ikgidywiahkalctEKLEQWLCEK---
2 pgavnachlscsallqdniadavacakrvvrDpqgirawvawrnrcqNRDVRQYVQGCGV
3 pgsknlcnipcsallssditasvncakkiasGgngmnawvawrnrckGTDVHAWIRGCRL
4 pgsrnlcnipcsallssditasvncakkivsDgngmnawvawrnrckGTDVQAWIRGCRL
1111111111111111111111111111111 111111111111111
The Match-Box program tries to find "boxes" (segments that are similar
in
all sequences submitted by the user): residues included in the boxes
are
printed in lowercase. Residues in upper case are not aligned, and gaps
are
placed arbitrarily before the next box.
A reliability score is written below each position in the boxes. It is
related to the statistical significance of the alignment at this
position:
the lowest scores corresponds to the highest reliability of the
alignment.
ACCESSIBILITY:
The Match-Box server is available at the Web site:
http://www.fundp.ac.be/sciences/biologie/bms/matchbox_submit.html
Using this web page, you can submit up to 50 protein sequences of no
more
than 2000 amino-acids each and receive by e-mail 2 output listings,
resulting
from two different analysis:
* explore : analyses the global similarities between the sequences.
* align : produces boxes (blocks) detected in the sequences and
arranged in the form of a multiple alignment.
This server can also be reached by electronic mail at the address:
matchbox at biq.fundp.ac.be
where you can send the key word "help" in the body of the message to
receive
informations about the methods of e-mail submission of data.
In a preliminary test period, the server is accessible to any person
from
academic or commercial institutions. A version of the software, as a
standalone program with a graphic interface is under developement.
INPUT:
The sequences can be submitted in FASTA-like, HSSP, or MSF formats,
and the scoring matrices used by the programs can be selected directly
from
the web page.
OUTPUT:
The Match-Box server returns by default two listings "explore" and
"align".
Moreover the user can also ask to receive an alignment in MSF or HSSP
format,
and a picture of the alignment in PostScript format.
DOCUMENTATION:
The web page gives access to a series of documents: general
presentation,
details about the method, examples with explanation of the results.
REFERENCES:
1. Depiereux, E. & Feytmans, E. (1991). Simultaneous amd multivariate
alignment of protein sequences: correspondance between
physicochemical
profiles and structurally conserved regions (SCR). Prot. Engng.
4(6),
603-613.
2. Depiereux, E. & Feytmans, E. (1992). MATCH-BOX - A fundamentally
new
algorithm for the simultaneous alignment of several protein
sequences.
Comput. Appl. Biosci. 8(5), 501-509.
3. McKenzie, H.A. & White, F.H. (Jr) (1991). Lysozyme and
alpha-lactalbumin,
function and interrelationships. Adv. Prot. Chem. 41, 173-258.