IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Protein distance/similarity measure

Brian Foley btf at t10.lanl.gov
Mon May 18 17:55:20 EST 1998

-- 	I am in need of a program to calculate pairwise 
similarity scores between amino acid sequences.  I need the
score to be in the form of % similarity.

	PIMA is the closest I can find to what I want.  But
it gives a score that is dependent on the sequence length and
composition.  A sequence of length 187 amino acids compared
to itself (100% identity) gives a score of 1077 for example,
while another sequence of length 188 gives a score of 1062.

	I have tried the protdist program from Joe Felsenstein's
PHYLIP package, but that does not just look at protein similarity,
it estimates the DNA evolutionary distance given the protein 

	What I want to do is compare the DNA distance between 
pairs of sequences, to the protein distance for the same pairs, to
look for evidence of selective pressure.  I want to go beyond
a synonymous/non-synonymous substitution ratio and see if
the non-synonmous subsitutions in one protein tend to be less
conservative than those in another.  i.e. two proteins might
be encoded by genes that have the same average DNA disversity,
but one tends to have conservative (Leu->Ile for example) AA
changes, while the other has more radical changes (Leu->Arg).

	Here is what I have tried so far:

FASTA  gives % identity, but no similarity score based on 
	amino acid structure

BLAST	does not allow gaps, does not necessarily use the
	full length of the sequence, does not score based on 
	amino acid structure

protdist  can do structure-based score, but also adds in 
	estimate of DNA distance?  For example a sequence
        of 10 Leucines compared to a sequence of 8 Luecines
        and 2 Isoleucines (80% identity and greater than
        80% similarity because Leu is similar to Ile) gives
        a protdist result of: 0.3854.

PIMA    does not gove score as a % of total possible score.
        gives different values for identities of different amino
        acids (P=P match scores 7 while G=G match scores 5).

	I appreciate any advice you might have on finding a program.
|Brian T. Foley               btf at t10.lanl.gov                       |
|HIV Database                 (505) 665-1970                         |
|Los Alamos National Lab      http://hiv-web.lanl.gov/index.html     |
|Los Alamos, NM 87544  U.S.A. http://www.t10.lanl.gov/~btf/home.html |

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net