ProAnWin update: protein alignment/plots/structure-activity analysis/de sign

Alexey M. Eroshkin eroshkin at vector.nsk.su
Wed Jul 2 03:41:32 EST 1997

To: bio-software at dl.ac.uk
From: Alexey Eroshkin <eroshkin at vector.nsk.su>
Cc: mutatiomn at net.bio.net, pop-bio at net.bio.net,
peptides at dl.ac.uk, molmodel at dl.ac.uk, mol-evol at dl.ac.uk, microbio at dl.ac.uk,
immuno at dl.ac.uk, hiv-biol at dl.ac.uk, fluorpro at dl.ac.uk, biophys at dl.ac.uk,
bio-matrix at dl.ac.uk, proteins at dl.ac.uk, virology at dl.ac.uk, xtal-log at dl.ac.uk
Subject: ProAnWin update: protein alignment/plots/structure-activity analysis/design

Dear all,

new version of ProAnWin (Protein Analyst for Win 3.11/95) now publicly
available from IUBio as

ftp://iubio.bio.indiana.edu/molbio/ibmpc/paw.exe (and paw.readme)

If you have access to e-mail only, the program can be obtained
via e-mail by sending the following message:

To: BITFTP at pucc.Princeton.EDU
ftp iubio.bio.indiana.edu uuencode
user anonymous
cd molbio/ibmpc
get paw.exe
get paw.readme

Server will return you UUENCODED program in several files.
Running UUDECODE you'll get the archive with the program.

                ProAnWin - Protein Analyst for Windows

                             version 3.01

Multiple sequence alignment, analysis of protein sequences and
   structures, structure-activity relationships, design of
           protein-engineering experiments

   Copyright(c)1995-97 I.Pika, A.Frolov, V.Ivanisenko, A.Eroshkin

All Trademarks and Registered Names are acknowledged in this document.

The files required to run ProAnWin are distributed in the form of a
single compressed file (self-extracted). Create a directory "PROANWIN"
on your hard disk, for example, C and copy the compressed file to the
directory.  Unpack the program (type PAW in DOS prompt and answer Yes
to all questions).  Once you extracted archive files, start Windows and
start the program.

This program is provided "AS IS" without any warranty, expressed or
implied to you or any other person.  The authors will not be liable for
incidental, consequential or other damages arising through the use of
this software.

As the program is under further development the documentation may not
reflect all current program options.


Main directory  - program modules
DATA            - files with amino acid physico-chemical properties,
                  manual, examples with input and output files
ALIGNS          - aligned sequences of 50 protein families

MAIN PROGRAM FEATURES     (* - new feature)

- Makes multiple sequence alignment - automatic (Clustal V) and manual,
global and local (in selected region);

- Threads multiple alignment onto known 3-dimensional structure;

- Imports data in all major formats (SWISS-PROT, PIR, FASTA, GCG,

- Imports protein 3D structure from Protein Data Bank files (PDB

- Inputs data on protein activities/property or phenotype;

- Transforms activity values (log (A), ln (A), A/K, A+k, etc.);

* Searches linear and spatial sites, conservative and variable in
changes of specified physico-chemical properties (for example,
helical hydrophobic moment);

* Searches linear and spatial sites, having high and low values of
specified physico-chemical properties (for example, Kyte-Doolittle

* Plots sets of different physico-chemical profiles for individual
protein sequence;

* Plots specified physico-chemical profiles for the set of sequences;

- Searches linear sites in multiple protein alignment and spatial
sites in protein 3D structure influencing protein activity/property;

* Plots average physico-chemical profile for the family of sequences;

* Plots profile of dispersion of physico-chemical profiles for the
family of sequences;

* Plots physico-chemical profiles for protein 3D structure;

- Analyses relationships between site structural characteristics and
protein activities by multiple linear regression analysis;

- Analyses structural differences between proteins divided by
functional, evolutionary or other criteria;

- Investigates physico-chemical factors related with activity changes
in a set of mutant proteins;

* Simulates protein-engineering experiments and predicts protein
activity. Has options for automatic mutant generation (to increase or
decrease protein activity) and for manual mutant generation;

* Predicts activity for newly sequenced proteins;

- Makes protein 3D pictures (mono and stereo) with sites highlighted;

* Has more then 400 amino acid physico-chemical properties;

- Investigates ten types of protein site characteristics, including
average values, helical moments, beta-strand moments, etc.;

- Saves results to the disk and saves pictures to the clipboard;

- Has help and manual.


1. The program helps to find information that can not be found by
other programs (activity/property-modulating sites, phenotype
defining regions);

2. The user has an opportunity to conduct the analysis of
structure-activity relationships in sequences and 3D structure.

3. The program permits to generate and to check up a plenty of
hypothesis about the role of different sites and their various
physico-chemical characteristics in protein activity, that is rather
difficult or impossible at the "hand-operated" analysis.

4. Search of structure - activity relations is carried out with the
use of multiple regression analysis and the results have statistical
evaluations on reliability.

5. The user has an opportunity to work simultaneously with sequences
and 3D protein structure (sites marked on the sequence are visualized
in 3D structure and vise versa).

6. Alongside with the conventional average physico-chemical
characteristics of a sequence site the user analyzes 9 additional
characteristics of sequential (linear) sites and 5 characteristics
of spatial sites.

7. The program permits considerably to reduce time during creation a
mutant proteins with desired property.


To investigate protein/peptide family of your interest you should
have or prepare sequence data file(s).  You can use alternatively
sequences data files in FASTA (PEARSON), PIR, SWISS-PROT, CLUSTAL,
GCG formats or in INTERNAL 1 format (3 data files with protein names
(*.seq), protein activities or grouping (*.act) and aligned sequences
(*.ali), see the examples in DATA directory) in the current
directory.  3D protein structure you can take from PDB database.

To use the program follow the steps:

- start the program;
- select sequences of the family you are going to investigate;
- select a file with required physico-chemical properties of amino acids;
- load protein 3D structure (if available);
- define an investigated fragment (or up to 8 fragments);
- define factors for analysis;

and so on.

All other information you'll get from MANUAL.TXT or HELP.


- protein structure-function and structure-activity investigations;
- designing proteins and peptides with improved activity;
- making multiple protein alignments and getting sense from it;
- studying phenotype-genotype correlations;
- preparation of protein 3D pictures with sites highlighted;
- protein features analysis;
- comparative protein sequence analysis.


1.  Frolov A.S., Pika I.S., Eroshkin A.M. ProMSED: Protein multiple
sequence editor for Windows 3.11/95. CABIOS, 1997, 13, 243-248

2. Morozov B.M., Ivanisenko V.A., Eroshkin A. M., Ugarova N.N.
Computer analysis of relations between bioluminescence color and
primary structure of beetle luciferases: identification of the sites
influencing bioluminescence color. Molec. Biology (Russia), 1996, 30,

3. Ivanisenko V.A., Pika I.S., Pinin S.I., Fomina T.I., Eroshkin A.M.
Studying structure-activity and phenotype-genotype relationships in
protein families. Methods, algorithms and applications. Folding and
Design, 1996, 1, Suppl., p.84.

4. Eroshkin A.M., Fomin V.I., Zhilkin P.A., Ivanisenko V.A.,
Kondrakhin Y.V.  PROANAL version 2: multifunctional program for
analysis of multiple protein sequence alignments and studying
structure-activity relationships in protein families. CABIOS, 1995,
11, 39-44.

5. Eroshkin A.M., Zhilkin P.A., Fomin V.I. Algorithm and computer
program PROANAL for analysis of relationship between structure and
activity in a family of proteins or peptides. CABIOS, 1993, 9,

6. Eroshkin A.M., Minenkova O.O., Fomin V.A., Ivanisenko V.A.,
Ilyichev A.A.  Analysis of peptide fragment insertions into major
coat protein of bacteriophages M13, f1 and fd. Relation of protein
structural characteristics and viability of mutant phages. Molec.
Biology (Russia), 1993, 27, 1345-1355.

The version installed has limit in the number of analyzed sequences
(15).  To get unlimited registered version please contact the authors.
If you have problems running ProAnWin please consult the manual
and HELP carefully to see if they can help.  If you still need advice
then please contact the authors by e-mail: eroshkin at vector.nsk.su

State Research Center of
Virology an Biotechnology "Vector"
Koltsovo, Novosibirsk Region,
633159  Russia
Tel: (3832) - 647774
Fax: (3832) - 328831

Ask authors for the updated ProAnWin version and


ProMSED2, MS Windows application for both automatic and manual DNA
and protein sequence alignment, editing, comparison and analysis.
ProMSED2 is the enhancement of ProMSED made according to user's
remarks and suggestions. The program reads main sequence formats and
performs automatic alignments, alignment visualization and editing
and it allows sequences to be aligned interactively leaving unchanged
previously aligned regions. The program has an user-friendly
interface. Manual alignment and sequence analysis are facilitated by
coloring schemes reflecting amino acid similarity in mutational,
physico-chemical and other properties. Although ProMSED was targeted
at protein sequences, it can be used on DNA sequences as well. The
program provides flexible tool for sequences alignment, analysis,
visualization, edition and presentations.


EMBL library:
IUBio archive:
ftp://iubio.bio.indiana.edu/molbio/ibmpc/promsed2.exe and .readme

The program does or has (+ - NEW or enhanced features):

+  inputs DNA and protein sequences in NBRF/PIR, Pearson (Fasta),
   MSF (GSG), EMBL/SwissProt, Intelligenetics and CLUSTAL formats;
o  has interface and functions like in others Windows applications
   (source file view, font changing, marking/unmarking, block and
   sequence selection, cut and paste, UNDO, etc.);
o  loads several sequence families in different windows,
   adds sequences to existing alignment, combines sequences from
   various files;
+  outputs the alignment in several popular formats;
+  makes presentation quality color and black-and-white prints of
   complete alignment or any selected block;
+  saves alignment picture as Windows metafile and bitmap;
o  permits to apply automatic alignment interactively (with
   options to change the alignment parameters) to any selected part
   of sequences of marked block;
+  calculates sequence similarity of complete sequences, of any selected
   sequence subset or of marked block in % and in PAM250 units (matrix
   of amino acid similarity);
+  calculates total (average for %) sequence similarity value - an
   estimation of alignment quality;
+  prints sequence similarity matrix;
+  sorts sequences by similarity of complete sequences or marked block;
+  displays conserved and semiconserved positions;
+  has many amino acid coloring schemes aimed to facilitate
   manual alignment and understanding protein sequence features.
   Some schemes are: EVOLUTIONARY CONSERVATIVE (reflects amino
   acid mutational properties), COMPLEX (similarity of amino acids
   in physico-chemical properties), HYDROPHOBICITY, CHARGE, BIG
   RESIDUES, ALPHA-HELIX, HELIX-BREAKERS, etc. The options to input
   user-defined schemes or change the colors of any amino acid
   groups are available;
+  searches subsequences and complex sequence patterns;
o  has complete HELP.


ProAnayst: DOS version of ProAnWin with additional functionality
(single and multiple sequences analysis, profiles analysis,
combinatorial libraries; design of protein engineering experiments)


IUBio archive: ftp://iubio.bio.indiana.edu/molbio/ibmpc/panalys1
EMBL library: ftp://ftp.ebi.ac.uk/pub/software/dos/proanalyst


o   data conversion from several protein sequence formats (FASTA,
o   databases with more then 50 amino acid physico-chemical properties;
o   inputs 3D protein structure in PDB format;
o   inputs user-defined protein activities, properties or related
o   searching SITES INFLUENCING PROTEIN ACTIVITY and analyzing
    relationships between protein site structural characteristics and
    protein activities (properties or related phenotypes);
o   multiple linear regression analysis of STRUCTURE-ACTIVITY
    relationships, discriminant analysis and ANOVA;
o   intra and cross group VARIABILITY analysis;
o   GENOTYPE -- PHENOTYPE CORRELATION analysis (e.g., for drug
    resistance in viruses);
o   alphabetical and physico-chemical analysis of protein features
    variations (in 1D and 3D structures);
o   structure-activity determination profile (SAD);
o   investigation of physico-chemical factors related with activity
    or property changes in MUTANT PROTEINS;
o   searching motifs in COMBINATORIAL LIBRARIES (peptide, phage-
    display libraries, etc.) with MOTIF MAPPING on the target protein;
o   design PROTEIN-ENGINEERING experiments;
o   sorting sequences by protein activity value, by protein group
    number and by number of motifs found;
o   mapping results on 3D structure and sequences.

Dr. Alexey Eroshkin               Institute of Molecular Biology
E.mail: eroshkin at vector.nsk.su    State Research Center of Virology and
Tel: +7 (3832) - 647774           Biotechnology "Vector"
Fax: +7 (3832) - 328831           Koltsovo, Novosibirsk Region 633159

More information about the Immuno mailing list

Send comments to us at biosci-help [At] net.bio.net