Gaps and PAMs
gonnet at inf.ethz.ch
Mon Jun 29 05:00:48 EST 1992
In article <1992Jun28.192435.26352 at gpu.utcs.utoronto.ca> lamoran at gpu.utcs.utoronto.ca (L.A. Moran) writes:
>I want to point out that "significance" is a subjective term.
yes, I agree, but with "subjective terms" we cannot do science. The
least controversial definition of "significance" is one which relates
the probability of an homology against the (null hypothesis) probability
of a random coincidence. As the model of homology gets more precise,
or you start including information of other nature (e.g. 3-d structure)
then the probabilities may be computed differently. But the definition
remains the same.
>Evolutionary distance is actually measured in years or some other unit of
>time. When comparing two sequences we can estimate the distance by examining
>the degree of similarity.
beg to disagree. Evolutionary distance, as shown by Dayhoff and many
other people, is best measured in PAM units or any units of mutation.
The reason is simple, when given just the sequences, we can estimate
directly their ED, but we cannot estimate their time-distance without
considering at least 3 of the biases which affect the relation between
amount of evolution and time. These are:
(a) species reproduce at very differnt rates
(b) crucial proteins mutate much more slowly than less important
proteins (due to a strong natural selection)
(c) changes in the environment "force" some rapid mutations.
So it would be nice to measure time, but we can at best measure
amount of evolution (amount of change).
>Conceptually, the easiest way to do this is a
>direct comparison of aligned sequences. As soon as you start introducing
>gaps into the alignment you have to make subjective decisions about the
>value of these gaps. Whenever you start "comparing" non-identical amino acid
>residues you have to make subjective decisions about the value of these
subjective decisions about the values of gaps is what has been done
until recently. We have now given a model under which parameters
can be computed from the available samples. I am afraid that you
tend to imply that alignment is "black magic" or "art". I disagree
strongly with this view. We should establish models, compute the
parameters for these models, verify/reject the models against reality
and move into better models when the old ones become unsuitable to
describe reality. This is the way that science makes progress, not
with "subjective measures". There are hundreds of examples of this
methodology in science.
>What I would really like to see is some serious discussion about the
>usefulness of gap penalties and mutation matrices. How confident can we be
>that marginally significant scores actually reflect evolutionary relatedness?
I will send you a preprint on our deletion model.
> I assume that
>when constructing a Dayhoff matrix only identical amino acids are counted
>in the initial alignment but that gaps are permitted. Is this correct?
no, you are mistaken, please read Dayhoff's original paper, the procedure
is much more sophisticated. If you would understand their ideas, you would
be much more confident in using their tools. (I have to agree though that
their paper is not very well written and difficult to understand, and by
today's standards it has a few flaws).
Gaston H. Gonnet, Informatik, ETH, Zurich.
More information about the Bio-soft