Protein distance/similarity measure

ketchup Ketchup*REMOVE* at concentric.net
Tue May 19 13:11:13 EST 1998


Dear Brian,
  
Try GeneDoc.  It is a multiple sequence alignment editor and analysis
program.  As such it requires that the sequences already be aligned.
However then it will calculate the precent conservative substitutions.
This is the precent identity plus the percent of amino acids that have
a positive score in the selected similarity scoring matrix.  You have
several choices of similarity matrix, either the Dayhoff PAM series or
the Henikoff & Henikoff Blosum series.  For what you want to do it
sounds like making the measurements in the context of a consistent
multiple sequence alignment would be better than uncoupled pairwise
measures of percent similarity anyway.
  
GeneDoc runs on any windows platform and can be downloaded from:
  
   http://www.cris.com/~Ketchup/GeneDoc.shtml
  



Brian Foley <btf at t10.lanl.gov> wrote:

>-- 	I am in need of a program to calculate pairwise 
>similarity scores between amino acid sequences.  I need the
>score to be in the form of % similarity.

>	PIMA is the closest I can find to what I want.  But
>it gives a score that is dependent on the sequence length and
>composition.  A sequence of length 187 amino acids compared
>to itself (100% identity) gives a score of 1077 for example,
>while another sequence of length 188 gives a score of 1062.

>	I have tried the protdist program from Joe Felsenstein's
>PHYLIP package, but that does not just look at protein similarity,
>it estimates the DNA evolutionary distance given the protein 
>sequences.

>	What I want to do is compare the DNA distance between 
>pairs of sequences, to the protein distance for the same pairs, to
>look for evidence of selective pressure.  I want to go beyond
>a synonymous/non-synonymous substitution ratio and see if
>the non-synonmous subsitutions in one protein tend to be less
>conservative than those in another.  i.e. two proteins might
>be encoded by genes that have the same average DNA disversity,
>but one tends to have conservative (Leu->Ile for example) AA
>changes, while the other has more radical changes (Leu->Arg).

>	Here is what I have tried so far:

>FASTA  gives % identity, but no similarity score based on 
>	amino acid structure

>BLAST	does not allow gaps, does not necessarily use the
>	full length of the sequence, does not score based on 
>	amino acid structure

>protdist  can do structure-based score, but also adds in 
>	estimate of DNA distance?  For example a sequence
>        of 10 Leucines compared to a sequence of 8 Luecines
>        and 2 Isoleucines (80% identity and greater than
>        80% similarity because Leu is similar to Ile) gives
>        a protdist result of: 0.3854.

>PIMA    does not gove score as a % of total possible score.
>        gives different values for identities of different amino
>        acids (P=P match scores 7 while G=G match scores 5).

>	I appreciate any advice you might have on finding a program.
> ____________________________________________________________________
>|Brian T. Foley               btf at t10.lanl.gov                       |
>|HIV Database                 (505) 665-1970                         |
>|Los Alamos National Lab      http://hiv-web.lanl.gov/index.html     |
>|Los Alamos, NM 87544  U.S.A. http://www.t10.lanl.gov/~btf/home.html |
>|____________________________________________________________________|







More information about the Bio-soft mailing list