Gap penalties, PAM matrices and so on

Fri Jun 26 16:13:45 EST 1992

Mark Cohen said:
> >- "mutation matrices ... differ, depending on whether they were derived
> >  from protein pairs that are distantly homologous or from protein pairs
> >  that are closely homologous". What a discovery !!
> >- how can anyone align confidently protein sequences that are "distantly
> >  homologous" and use the results to build a matrix ?
> We did not align "distantly homologous" and build a matrix from the results
> We aligned all the proteins in the data base with all the others.
> [...]

OK, but the question still stands, how does one confidently align
those segments that are "distantly homologous" (your term)?

> Where the scores obtained (using Dayhoff's 1978 matrix) indicated
> that the alignments were significant (ie that the probability of the
> alignment was significantly higher than alignment of two random
> sequences) these alignments were used in the construction of the matrix.

So, then, "distant" means "significant"?

> >- what are "distantly homologous" proteins ? [...]
> [Tautology deleted]  Proteins for which the alignment score is high
> enough above the score of aligned random sequences yet not so high
> as to be unambiguously related.

If only that had been stated in the paper...

> >- what is the influence of the enormous redundancy found in protein
> >  databanks (hundreds of cytochromes, thousands of histones, zillions of
> >  globulins, ...)

> We will in future publish the matrices calculated with, without and only for
> the immunoglobulins.  The results do not change our opinion significantly.

Ditto. Just the single sentence would have helped.

> >- the explanation for the -3/2 power concerning the probability of
> > a gap is a joke ? 

> The k^-3/2 term is an experimental result.  The probability of the
> two ends of a chain being close in space is dependant on the length
> of the chain as described in the paper, or you can read Flory's book
> on polymers. 

I can't say where I heard this (perhaps they will post) but others
have found this result also. 

> >        Well, I prefer to stop here. May I draw your attention on the paper
> >by Jones, Taylor and Thornton in the last CABIOS issue ? Their aim was also
> >to build an updated Dayhoff matrix. They did it, with the difference that
> >their procedure is crystal clear. And that, by necessity, their matrix was
> >not built with "distantly homologous proteins".

> Jones et al found like us that the differences between the Dayhoff
> 1978 matrix and the recalculated matrix were largest for the least
> common amino acid pairs, eg W-Y or W-C etc.  Their paper is somewhat
> longer than ours hence their more detailed explanation.

I don't think so, I think it was just a matter of presention.  Just a
few sentences in Dr. Cohen's post have cleared a number of
misunderstandings up.


dr. dan davison/dept. of biochemical and biophysical sciences/univ. of
