Gaps and PAMs (How Dayhoff computed its log-odds matrix)
gonnet at inf.ethz.ch
Wed Jul 1 04:33:54 EST 1992
In article <1992Jun30.024538.3684 at gpu.utcs.utoronto.ca> lamoran at gpu.utcs.utoronto.ca (L.A. Moran) writes:
> "I assume that when constructing a Dayhoff matrix only identical
> amino acids are counted in the initial alignment but that gaps
> are permitted. Is this correct?"
> "no, you are mistaken, please read Dayhoff's original paper, the
> procedure is much more sophisticated. If you would understand their
> ideas, you would be much more confident in using their tools."
>How interesting. When you construct a new "Dayhoff" matrix do you use the
>old one to improve the alignments that form the database? If not, then what
>"sophisticated" assumptions do you make that justify comparing non-identical
>residues in the original alignments? Do you think that these assumptions
>might affect the final matrix?
As a matter of fact, Dayhoff et al. counted only mutated positions,
not "identical amino acids" as you say.
There is a circularity, as you note. To compute a better estimate
you need to have alignments. To have alignments you need a good
matrix. Dayhoff et al. broke this circularity by computing the
sampled alignments by hand. We do it iteratively. It is possible
to prove that when you select pairs of sequences which are not to
far apart (in PAM distance), the process of aligning, estimating DM,
aligning with new matrix, estimating... converges. The only requirement
for convergence is that you start with a diagonally dominant mutation
matrix. Because the final matrix that we found was the same (up to
roundoff error) for 3 initial matrices (Dayhoff's original one,
a very early estimate of our own, and an identity matrix) we have
sufficient confidence that the procedure is sound.
I hope this answers your questions.
Gaston H. Gonnet, Informatik, ETH, Zurich.
More information about the Bio-soft