Gaps and PAMs

L.A. Moran lamoran at gpu.utcs.utoronto.ca
Sun Jun 28 14:24:35 EST 1992


Gaston H. Gonnet of Informatik, ETH (Zurich) writes,

     "'significant' is related to the probability of an alignment
      being derived from homology as opposed to being random. This
      is measured directly by the score of the alignment when you
      use Dayhoff matrices.  So the highest the score the highest
      the significance."

I want to point out that "significance" is a subjective term. In order to
determine whether an alignment is significant you have to ask yourself
whether the number of matches is greater than what is expected for any
two random sequences then you have to decide what level of matching you
consider to be significant. For many amino acid sequence comparisons the 
significance is obvious but problems arise when we are dealing with alignments
that have large numbers of gaps and few identities. In order to make life 
easier for biologists a number of computer algorithms have been exploited  
in order to increase alignment "scores". Of course these programs increase 
the scores of random sequences as well. The hope is that the scores of related
proteins on the verge of significance will be increased by a larger amount
thus moving these scores into the (subjective) "significance" catagory.
It order to inform readers about subjective criteria of "significance"
one should say what the random scores are and how they are calculated and 
what cutoff point has been selected (and why). It is important that the random
sequences reflect the average amino acid composition of proteins.

I don't think that it is correct to say that significance "...is measured
directly by the score of the alignments when you use the Dayhoff matrices".
A more correct statement would be; "We believe that use of the Dayhoff
matrix reflects some sort of biological reality which allows us to detect
homology which is not otherwise obvious; we believe that values above x 
indicate that two sequences are homologous". I have seen several examples of
the misuse of such matrices where authors claim that two proteins are
homologous on the basis of questionable scores. 

Gaston H. Gonnet also says,

    "'distant' is related to how long ago or recently the two
     sequences diverged.  This is measured in PAM units as I
     explained in a recent posting."

Evolutionary distance is actually measured in years or some other unit of
time. When comparing two sequences we can estimate the distance by examining
the degree of similarity. Conceptually, the easiest way to do this is a 
direct comparison of aligned sequences. As soon as you start introducing
gaps into the alignment you have to make subjective decisions about the
value of these gaps. Whenever you start "comparing" non-identical amino acid
residues you have to make subjective decisions about the value of these
"matches". One such subjective decision is to use a Dayhoff matrix. The more 
assumptions you make the greater the danger of error. We should try very hard
to remember that the output of computer programs (eg. PAM units) are only as 
good as the subjective assumptions that were made in writing the program. 
(I am assuming that the program was correctly written.)

What I would really like to see is some serious discussion about the 
usefulness of gap penalties and mutation matrices. How confident can we be
that marginally significant scores actually reflect evolutionary relatedness?

Has anyone looked closely at the relationship between alignment programs
and mutation matrices? My own experience indicates that no aligment programs
are capable of aligning multiple sequences as well as an intelligent human.
Those programs that simply align pairs of sequences often produce results that
are very different from a serious multiple sequence alignment. I assume that
when constructing a Dayhoff matrix only identical amino acids are counted
in the initial alignment but that gaps are permitted. Is this correct?
 
Laurence A. Moran (Larry)
Dept. of Biochemistry




More information about the Bio-soft mailing list