In article <9304210131.AA08023 at spider.ento.csiro.au> lizvanp at ento.csiro.au writes:
>I have an alignment between two proteins which covers a region of about 100
>residues. The proteins I am comparing are 200 and 240 residues in size, and
>the region that aligns has 31% identity, but only 20% similarity (according
>to MaxHom, EMBL at Heidelberg). Also the region that aligns contains a
>residue which is involved in the active site, but one of these proteins has
>no alignment in this particular area. Can anyone give me some idea whether
>this level of similarity has any real meaning? Cheers Lis
31% identity over 100 residues seems likely to be significant.
What is the local similarity (PAM250) score? It is unclear why you
would have lower similarity than identity; the program that calculates
similarity must be requiring a "global" alignment - one that extends
from end-to-end. This may not be appropriate.
To test the significance of the similarity score, you could
use the "rss" program, which compares the two sequences using the
Smith-Waterman algorithm, and then shuffles one of the sequences and
generates similarity scores for the randomized sequence. "rss" is a
derivative of "rdf2," which was described in Pearson and Lipman (1988)
PNAS 85:2444. It is available with the fasta package, which you can
obtain (for unix and VMS machines) from uvaarpa.virginia.EDU:pub/fasta.