Equivalent amino acids for similarity searches

Evan W. Steeg steeg at cs.toronto.edu
Thu Mar 24 14:41:18 EST 1994


In article <1994Mar23.235848.60877 at kuhub.cc.ukans.edu> pgegen at kuhub.cc.ukans.edu (Peter Gegenheimer) writes:
>One point to consider: the amino acid substitutions which are ALLOWED BY NATURE 
>during evolution are NOT necessarily the same as those which preserve the "chemical 
>similarity" of the side-chain. A number of follow-up postings have called attention 
>to the various evolutionary substitution matrices available. I should like to add two 
>further helpful items of information.
>
>1) If you are working with one member of a family of (highly-conserved) proteins, 
>and you have their amino acid sequences aligned in a way you KNOW to be correct 
>(i.e., all residues in one column had the same evolutionary ancestor), then the
>OBSERVED variation at each position is a good clue to the ALLOWED variation at that 
>position!
>
>2) The best work on amino acid substitutions from a combined chemical and 
>evolutionary angle is that of D. Bordo and P. Argos, Suggestions for "safe" residue 
>substitutions in site-directed mutagenesis, J. Mol. Biol. 217, 721-729 (1991). This
>work examines nine conserved "families" of proteins all of whose 3-D crystal 
>structures are known, and tabulates the allowed substitutions for amino acids in the
>same 3-D environment.
>

  Also, to the extent that you want to look at "equivalence" or "similarity"
between amino acids directly in terms of the physico-chemical properties,
you'll find the following reference useful:

@article        ( nakai-props,
key     =       "Nakai and Kidera and Kanehisa" ,
author  =       "Nakai, K. and Kidera, A. and Kanehisa, M." ,
title   =       "Cluster Analysis of Amino Acid Indices for Prediction of
                 Protein Structure and Function",
journal =       "Protein Engineering",
year    =       "1988" ,
volume  =       "2" ,
number  =       "2" ,
pages   =       "93-100"
)

  In it, the authors do an extensive statistical analysis of the relationships
between the dozens of different property indices proposed by protein scientists
over the years.  Their results allow one to, for example, choose a very
small number of properties -- a "basis set", essentially -- with which to 
represent protein sequences in computational algorithms.

  -- Evan


-- 

Evan W. Steeg (416) 978-5182              steeg at ai.toronto.edu 
Dept of Computer Science                  steeg at t13.lanl.gov 
University of Toronto,           
Toronto, Canada M5S 1A4         



More information about the Proteins mailing list