RNA secondary structure similarity identification

Eddy S. sre at al.cam.ac.uk
Fri Jan 14 03:44:25 EST 1994

In article <1994Jan13.191610.15552 at alw.nih.gov> bergsage at helix.nih.gov (Peter Leif Bergsagel) writes:
  >I have the sequence of different stretches of RNA that bind to a protein. I
  >am trying to develop a consensus RNA binding site. I suspect that it is
  >related to the secondary (or tertiary) structure. When I plot these out
  >using FOLD I have a very hard time comparing one structure to another, yet
  >alone a whole series. Are there any suggestions for how to determine the
  >consensus site sequence ?


Yeah, it's a hard problem. There's no public software out there for
the problem, that I know of. You might have a look at work by Danielle
Konings (such as JMB 207:297-614 1989), Bruce Shapiro (such as CABIOS
6:309-318 1990), and Michael Zuker (such as J. Biol. Struct. Dynamics
8:1027-1044 1991). Bruce Shapiro is near you, I think, at NCI in
Frederick. Basically, there's two tricks that are used. One approach
encodes single suboptimal RNA structure predictions as strings (>,<
symbols for base pairs) then compares those string representations
looking for common structures. A second approach represents suboptimal
structure predictions as trees, then uses tree-comparison algorithms
to compare the structures of many possible structures for each
sequence and find a common tree.

Depending on how many sequences you have and how long they are, I
might be able to help out. Recently, we (myself and Richard Durbin
here in Cambridge) and David Haussler's group at UC Santa Cruz have
independently hit on what looks like a good and general way to model
RNA secondary structure and primary sequence simultaneously. Both
approaches are based on "stochastic context-free grammars", which let
you deal with long-range pairwise sequence correlations like those
caused by RNA secondary structure.  Both groups have developed
algorithms for doing RNA structure-based multiple alignment and
database searching.  Furthermore, since I came from a lab in Boulder
that does in vitro RNA evolution experiments, I've been concentrating
on exactly your problem of inferring consensus structures from a set
of unaligned sequences, by automating the process of comparative
analysis. I've got a secondary structure prediction algorithm that
performs with pretty much 100% accuracy on the small RNA sets we've
looked at, with the caveat that we still have to deal with pseudoknots
somewhat manually.  Trouble is, at the moment the computational burden
is high and the number of sequences necessary is large; if you've got
at least 20 sequences or so (the more the better) and they're less
than 150 nt or so, I could have a good go at them.

A preprint from David Haussler's group (I hope they don't mind me
advertising it) is available from ftp.cse.ucsc.edu, in pub/rna.  You
can ftp a preprint of our paper from cele.mrc-lmb.cam.ac.uk, in
pub/cove/cove_preprint.ps.Z.  I'd be happy to chat more by email if
you're interested, and you might get in touch with David to see if
he's working on any consensus structure prediction stuff in Santa

- Sean

- Sean Eddy
- Laboratory of Molecular Biology, MRC, Cambridge UK
- sre at mrc-lmb.cam.ac.uk

More information about the Bio-soft mailing list