ProAnWin update: protein alignment/plots/structure-activity analysis/de sign
Alexey M. Eroshkin
eroshkin at vector.nsk.su
Wed Jul 2 03:41:43 EST 1997
To: bio-software at dl.ac.uk
From: Alexey Eroshkin <eroshkin at vector.nsk.su>
Cc: mutatiomn at net.bio.net, pop-bio at net.bio.net,
peptides at dl.ac.uk, molmodel at dl.ac.uk, mol-evol at dl.ac.uk, microbio at dl.ac.uk,
immuno at dl.ac.uk, hiv-biol at dl.ac.uk, fluorpro at dl.ac.uk, biophys at dl.ac.uk,
bio-matrix at dl.ac.uk, proteins at dl.ac.uk, virology at dl.ac.uk, xtal-log at dl.ac.uk
Subject: ProAnWin update: protein alignment/plots/structure-activity analysis/design
new version of ProAnWin (Protein Analyst for Win 3.11/95) now publicly
available from IUBio as
ftp://iubio.bio.indiana.edu/molbio/ibmpc/paw.exe (and paw.readme)
If you have access to e-mail only, the program can be obtained
via e-mail by sending the following message:
To: BITFTP at pucc.Princeton.EDU
From: YOUR E-MAIL ADDRESS
ftp iubio.bio.indiana.edu uuencode
Server will return you UUENCODED program in several files.
Running UUDECODE you'll get the archive with the program.
ProAnWin - Protein Analyst for Windows
Multiple sequence alignment, analysis of protein sequences and
structures, structure-activity relationships, design of
Copyright(c)1995-97 I.Pika, A.Frolov, V.Ivanisenko, A.Eroshkin
All Trademarks and Registered Names are acknowledged in this document.
The files required to run ProAnWin are distributed in the form of a
single compressed file (self-extracted). Create a directory "PROANWIN"
on your hard disk, for example, C and copy the compressed file to the
directory. Unpack the program (type PAW in DOS prompt and answer Yes
to all questions). Once you extracted archive files, start Windows and
start the program.
This program is provided "AS IS" without any warranty, expressed or
implied to you or any other person. The authors will not be liable for
incidental, consequential or other damages arising through the use of
As the program is under further development the documentation may not
reflect all current program options.
Main directory - program modules
DATA - files with amino acid physico-chemical properties,
manual, examples with input and output files
ALIGNS - aligned sequences of 50 protein families
MAIN PROGRAM FEATURES (* - new feature)
- Makes multiple sequence alignment - automatic (Clustal V) and manual,
global and local (in selected region);
- Threads multiple alignment onto known 3-dimensional structure;
- Imports data in all major formats (SWISS-PROT, PIR, FASTA, GCG,
- Imports protein 3D structure from Protein Data Bank files (PDB
- Inputs data on protein activities/property or phenotype;
- Transforms activity values (log (A), ln (A), A/K, A+k, etc.);
* Searches linear and spatial sites, conservative and variable in
changes of specified physico-chemical properties (for example,
helical hydrophobic moment);
* Searches linear and spatial sites, having high and low values of
specified physico-chemical properties (for example, Kyte-Doolittle
* Plots sets of different physico-chemical profiles for individual
* Plots specified physico-chemical profiles for the set of sequences;
- Searches linear sites in multiple protein alignment and spatial
sites in protein 3D structure influencing protein activity/property;
* Plots average physico-chemical profile for the family of sequences;
* Plots profile of dispersion of physico-chemical profiles for the
family of sequences;
* Plots physico-chemical profiles for protein 3D structure;
- Analyses relationships between site structural characteristics and
protein activities by multiple linear regression analysis;
- Analyses structural differences between proteins divided by
functional, evolutionary or other criteria;
- Investigates physico-chemical factors related with activity changes
in a set of mutant proteins;
* Simulates protein-engineering experiments and predicts protein
activity. Has options for automatic mutant generation (to increase or
decrease protein activity) and for manual mutant generation;
* Predicts activity for newly sequenced proteins;
- Makes protein 3D pictures (mono and stereo) with sites highlighted;
* Has more then 400 amino acid physico-chemical properties;
- Investigates ten types of protein site characteristics, including
average values, helical moments, beta-strand moments, etc.;
- Saves results to the disk and saves pictures to the clipboard;
- Has help and manual.
ProAnWin PERMITS TO OBTAIN NEW RESULTS IMPORTANT IN BIOCHEMISTRY,
MOLECULAR BIOLOGY ETC., AND TO DESIGN PROTEIN ENGINEERING EXPERIMENTS:
1. The program helps to find information that can not be found by
other programs (activity/property-modulating sites, phenotype
2. The user has an opportunity to conduct the analysis of
structure-activity relationships in sequences and 3D structure.
3. The program permits to generate and to check up a plenty of
hypothesis about the role of different sites and their various
physico-chemical characteristics in protein activity, that is rather
difficult or impossible at the "hand-operated" analysis.
4. Search of structure - activity relations is carried out with the
use of multiple regression analysis and the results have statistical
evaluations on reliability.
5. The user has an opportunity to work simultaneously with sequences
and 3D protein structure (sites marked on the sequence are visualized
in 3D structure and vise versa).
6. Alongside with the conventional average physico-chemical
characteristics of a sequence site the user analyzes 9 additional
characteristics of sequential (linear) sites and 5 characteristics
of spatial sites.
7. The program permits considerably to reduce time during creation a
mutant proteins with desired property.
HOW TO START
To investigate protein/peptide family of your interest you should
have or prepare sequence data file(s). You can use alternatively
sequences data files in FASTA (PEARSON), PIR, SWISS-PROT, CLUSTAL,
GCG formats or in INTERNAL 1 format (3 data files with protein names
(*.seq), protein activities or grouping (*.act) and aligned sequences
(*.ali), see the examples in DATA directory) in the current
directory. 3D protein structure you can take from PDB database.
To use the program follow the steps:
- start the program;
- select sequences of the family you are going to investigate;
- select a file with required physico-chemical properties of amino acids;
- load protein 3D structure (if available);
- define an investigated fragment (or up to 8 fragments);
- define factors for analysis;
and so on.
All other information you'll get from MANUAL.TXT or HELP.
ProAnWin IS USEFUL IN:
- protein structure-function and structure-activity investigations;
- designing proteins and peptides with improved activity;
- making multiple protein alignments and getting sense from it;
- studying phenotype-genotype correlations;
- preparation of protein 3D pictures with sites highlighted;
- protein features analysis;
- comparative protein sequence analysis.
1. Frolov A.S., Pika I.S., Eroshkin A.M. ProMSED: Protein multiple
sequence editor for Windows 3.11/95. CABIOS, 1997, 13, 243-248
2. Morozov B.M., Ivanisenko V.A., Eroshkin A. M., Ugarova N.N.
Computer analysis of relations between bioluminescence color and
primary structure of beetle luciferases: identification of the sites
influencing bioluminescence color. Molec. Biology (Russia), 1996, 30,
3. Ivanisenko V.A., Pika I.S., Pinin S.I., Fomina T.I., Eroshkin A.M.
Studying structure-activity and phenotype-genotype relationships in
protein families. Methods, algorithms and applications. Folding and
Design, 1996, 1, Suppl., p.84.
4. Eroshkin A.M., Fomin V.I., Zhilkin P.A., Ivanisenko V.A.,
Kondrakhin Y.V. PROANAL version 2: multifunctional program for
analysis of multiple protein sequence alignments and studying
structure-activity relationships in protein families. CABIOS, 1995,
5. Eroshkin A.M., Zhilkin P.A., Fomin V.I. Algorithm and computer
program PROANAL for analysis of relationship between structure and
activity in a family of proteins or peptides. CABIOS, 1993, 9,
6. Eroshkin A.M., Minenkova O.O., Fomin V.A., Ivanisenko V.A.,
Ilyichev A.A. Analysis of peptide fragment insertions into major
coat protein of bacteriophages M13, f1 and fd. Relation of protein
structural characteristics and viability of mutant phages. Molec.
Biology (Russia), 1993, 27, 1345-1355.
The version installed has limit in the number of analyzed sequences
(15). To get unlimited registered version please contact the authors.
If you have problems running ProAnWin please consult the manual
and HELP carefully to see if they can help. If you still need advice
then please contact the authors by e-mail: eroshkin at vector.nsk.su
State Research Center of
Virology an Biotechnology "Vector"
Koltsovo, Novosibirsk Region,
Tel: (3832) - 647774
Fax: (3832) - 328831
Ask authors for the updated ProAnWin version and
ADDITIONAL NEW SOFTWARE TOOLS ProMSED2, ProAnalyst:
ProMSED2, MS Windows application for both automatic and manual DNA
and protein sequence alignment, editing, comparison and analysis.
ProMSED2 is the enhancement of ProMSED made according to user's
remarks and suggestions. The program reads main sequence formats and
performs automatic alignments, alignment visualization and editing
and it allows sequences to be aligned interactively leaving unchanged
previously aligned regions. The program has an user-friendly
interface. Manual alignment and sequence analysis are facilitated by
coloring schemes reflecting amino acid similarity in mutational,
physico-chemical and other properties. Although ProMSED was targeted
at protein sequences, it can be used on DNA sequences as well. The
program provides flexible tool for sequences alignment, analysis,
visualization, edition and presentations.
ftp://iubio.bio.indiana.edu/molbio/ibmpc/promsed2.exe and .readme
The program does or has (+ - NEW or enhanced features):
+ inputs DNA and protein sequences in NBRF/PIR, Pearson (Fasta),
MSF (GSG), EMBL/SwissProt, Intelligenetics and CLUSTAL formats;
o has interface and functions like in others Windows applications
(source file view, font changing, marking/unmarking, block and
sequence selection, cut and paste, UNDO, etc.);
o loads several sequence families in different windows,
adds sequences to existing alignment, combines sequences from
+ outputs the alignment in several popular formats;
+ makes presentation quality color and black-and-white prints of
complete alignment or any selected block;
+ saves alignment picture as Windows metafile and bitmap;
o permits to apply automatic alignment interactively (with
options to change the alignment parameters) to any selected part
of sequences of marked block;
+ calculates sequence similarity of complete sequences, of any selected
sequence subset or of marked block in % and in PAM250 units (matrix
of amino acid similarity);
+ calculates total (average for %) sequence similarity value - an
estimation of alignment quality;
+ prints sequence similarity matrix;
+ sorts sequences by similarity of complete sequences or marked block;
+ displays conserved and semiconserved positions;
+ has many amino acid coloring schemes aimed to facilitate
manual alignment and understanding protein sequence features.
Some schemes are: EVOLUTIONARY CONSERVATIVE (reflects amino
acid mutational properties), COMPLEX (similarity of amino acids
in physico-chemical properties), HYDROPHOBICITY, CHARGE, BIG
RESIDUES, ALPHA-HELIX, HELIX-BREAKERS, etc. The options to input
user-defined schemes or change the colors of any amino acid
groups are available;
+ searches subsequences and complex sequence patterns;
o has complete HELP.
ProAnayst: DOS version of ProAnWin with additional functionality
(single and multiple sequences analysis, profiles analysis,
combinatorial libraries; design of protein engineering experiments)
IUBio archive: ftp://iubio.bio.indiana.edu/molbio/ibmpc/panalys1
EMBL library: ftp://ftp.ebi.ac.uk/pub/software/dos/proanalyst
o data conversion from several protein sequence formats (FASTA,
SWISS-PROT, PIR, CLUSTAL).
o databases with more then 50 amino acid physico-chemical properties;
o inputs 3D protein structure in PDB format;
o flexible VISUALIZATION OF PROTEIN 3d STRUCTURES with sites
o inputs user-defined protein activities, properties or related
o searching SITES INFLUENCING PROTEIN ACTIVITY and analyzing
relationships between protein site structural characteristics and
protein activities (properties or related phenotypes);
o multiple linear regression analysis of STRUCTURE-ACTIVITY
relationships, discriminant analysis and ANOVA;
o intra and cross group VARIABILITY analysis;
o GENOTYPE -- PHENOTYPE CORRELATION analysis (e.g., for drug
resistance in viruses);
o alphabetical and physico-chemical analysis of protein features
variations (in 1D and 3D structures);
o structure-activity determination profile (SAD);
o investigation of physico-chemical factors related with activity
or property changes in MUTANT PROTEINS;
o searching motifs in COMBINATORIAL LIBRARIES (peptide, phage-
display libraries, etc.) with MOTIF MAPPING on the target protein;
o design PROTEIN-ENGINEERING experiments;
o ACTIVITY, PROPERTY AND PHENOTYPE PREDICTION;
o sorting sequences by protein activity value, by protein group
number and by number of motifs found;
o mapping results on 3D structure and sequences.
Dr. Alexey Eroshkin Institute of Molecular Biology
E.mail: eroshkin at vector.nsk.su State Research Center of Virology and
Tel: +7 (3832) - 647774 Biotechnology "Vector"
Fax: +7 (3832) - 328831 Koltsovo, Novosibirsk Region 633159
More information about the Xtal-log