Protein distance/similarity measure
btf at t10.lanl.gov
Mon May 18 17:55:20 EST 1998
-- I am in need of a program to calculate pairwise
similarity scores between amino acid sequences. I need the
score to be in the form of % similarity.
PIMA is the closest I can find to what I want. But
it gives a score that is dependent on the sequence length and
composition. A sequence of length 187 amino acids compared
to itself (100% identity) gives a score of 1077 for example,
while another sequence of length 188 gives a score of 1062.
I have tried the protdist program from Joe Felsenstein's
PHYLIP package, but that does not just look at protein similarity,
it estimates the DNA evolutionary distance given the protein
What I want to do is compare the DNA distance between
pairs of sequences, to the protein distance for the same pairs, to
look for evidence of selective pressure. I want to go beyond
a synonymous/non-synonymous substitution ratio and see if
the non-synonmous subsitutions in one protein tend to be less
conservative than those in another. i.e. two proteins might
be encoded by genes that have the same average DNA disversity,
but one tends to have conservative (Leu->Ile for example) AA
changes, while the other has more radical changes (Leu->Arg).
Here is what I have tried so far:
FASTA gives % identity, but no similarity score based on
amino acid structure
BLAST does not allow gaps, does not necessarily use the
full length of the sequence, does not score based on
amino acid structure
protdist can do structure-based score, but also adds in
estimate of DNA distance? For example a sequence
of 10 Leucines compared to a sequence of 8 Luecines
and 2 Isoleucines (80% identity and greater than
80% similarity because Leu is similar to Ile) gives
a protdist result of: 0.3854.
PIMA does not gove score as a % of total possible score.
gives different values for identities of different amino
acids (P=P match scores 7 while G=G match scores 5).
I appreciate any advice you might have on finding a program.
|Brian T. Foley btf at t10.lanl.gov |
|HIV Database (505) 665-1970 |
|Los Alamos National Lab http://hiv-web.lanl.gov/index.html |
|Los Alamos, NM 87544 U.S.A. http://www.t10.lanl.gov/~btf/home.html |
More information about the Mol-evol