Sequence/Structure Comparisons

Barbara K. Moore Bryant barb at ai.mit.edu
Fri Sep 10 09:34:12 EST 1993



In reply to Simon Brocklehurst's message about sequence/structure
comparisons:

(1) How much better is a sequence/structure comparison than aligning
(possibly multiple) sequences to a known-structure sequence?

The claim of the sequence/structure matchers is that they can catch
structural similarities between proteins whose sequences are very
different, and therefore a sequence alignment approach wouldn't work.
Most groups tend to give as examples one or a few such matches.
Examples: Bowie/Luthy/Eisenberg find such matches between CRP
and the cAMP-dependent protein kinase family, and between actin and
HSC70.  Jones/Taylor/Thornton find C-phycocyanin like globins.

Also the sequence/structure matching approach has the potential of
matching a sequence to an as-yet-unseen structure which has been
postulated, perhaps by combining pieces of super-secondary structure.
(Temple Smith's group, for example, discussed this possibility in a
poster at Waterville Valley.)  This is relevant because there are
estimates that we've seen a sizable chunk but not all of the possible
structures.


(2) Relative merits of various approaches

There are several issues to look at.  

-- EVALUATION FUNCTION.  What is the evaluation function for a given
threading of a sequence onto a structure?  (This is what Simon pointed
to in the Blundell approach.)  Each group seems to have extensive
justification for their approach to collecting statistics and turning
them into an evaluation function.  It's really hard to say which is
better; the true test should probably be how well each performs on
experiments.  Wouldn't it be nice if there were universally used
well-defined sets of experiments so we could do this comparison easily?

-- MATCHER SPEED.  How long does the algorithm take to run?  The
critical question: is it possible to match each sequence in the
sequence database to each structure in the model structure set in a
reasonable time, say, a month?  Profiles (Eisenberg) and Substitution
Pattern (Blundell) use dynamic programming (DP) algorithms.  Other
methods (Godzik & Skolnick, Bryant & Lawrence, Jones & Taylor, Smith
lab) incorporate pairwise interactions and therefore can't use a
straight DP algorithm, but rather something that takes a lot longer to
run (like the two-stage DP of Jones & Taylor, or the iterated frozen
DP of Godzik & Skolnick, or the exhaustive search of Bryant &
Lawrence).  Lathrop & Smith have the fastest optimal search algorithm
for evaluation functions that include pairwise interactions, but it
still takes significantly longer than DP.

-- EXPERIMENTS.  What experiments have the authors run, and are they
meaningful?  Some results seem clearly not meaningful (eg, not
allowing gaps in the sequence/structure alignment and finding that the
sequence always finds its own structure).  Others could very well be
found by other means, like sequence alignment.  


(3) Are articles in prestigious journals given more credence?

Maybe so.  Also consider institutional affiliation, research
history of authors, funding, & various other political factors.


Other questions of interest:
---------------------------

(1) Modeling the structure of a protein could be broken down into two
steps: choosing the right structure from a structure library, then
optimally placing the sequence on the structure (followed by side
chain fitting, loop packing, energy minimization).  The
sequence/structure matchers tend to combine these steps: they
optimally thread the sequence onto each structure (second step first),
and then compare the scores of these optimally threaded matches.  One
interesting questions is how well can you do on choosing the right
structure for a sequence (from a structure library) without doing the
full threading?  You might use really simple measures like length and
amino acid composition, or more involved things like the
Stultz/White/Smith Markov models that were designed to determine the
most likely fold for a sequence.  

(2) How well can you do on sequence/structure matching without looking at
pairwise interactions between residues (and thereby allowing the use
of faster dynamic programming algorithms to optimize your evaluation
function)?  

(3) What test suite might we propose to compare the various
approaches?  I think it would be great, among other things, to have a
facility whereby sequences of structures that are about to be
published are made public before the structures come out.  This might
be administered by the structure database people or by someone else.
Everyone who wanted could run their predictor on the sequence, and
then see how well they did when the structure appears.  


A couple references (reviews):

Fetrow & Bryant, "New programs for protein tertiary structure
prediction," Bio/Technology 11:479-484, 1993.

Blundell & Johnson, "Catching a common fold," Protein Science
2:877-883, 1993.



More information about the Comp-bio mailing list