The following special event on molecular biology/information theory may be of
interest to those attending the Annual Neuroscience Conference in Washington,
D.C. in November.
The Biological Information Theory
And Chowder Society
will press its luck by having a pot luck at the NCBI. Since we didn't mess up
the place too badly last time, we will have another of our infamous chowder
Thursday, November 11 in the NCBI's 8th floor library at the
National Library of Medicine Lister Hill Center.
Everyone should bring their
own dinner with enough to trade with others for additional courses. There is a
soda machine and coffee will be available. You should bring any other beverage
you may want. (Although alcoholic beverages are not permitted in the building,
we understand the chowder may have a somewhat higher than usual vapor
We try to put the new guys first (that way they aren't as likely to sneak out),
so Chip Lawrence from the Biometrics Lab, Wadsworth Labs, Albany, NY, and the
National Center for Biotechnology Information will go first and attempt to lead
a discussion on the methodology in his recent Science article.
author = "Charles E. Lawrence and Stephen F. Altschul and Mark S. Boguski and
Jun S. Liu and Andrew F. Neuwald and John C. Wootton",
title = "Detecting Subtle Sequence Signals:
A Gibbs Sampling Strategy for Multiple Alignment",
journal = "Science",
volume = "262",
pages = "208-214",
year = 1993}
Then Tom Schneider will soak up the remainder of the time (and probably the
chowder too) giving a brief reprise of his presentation to the Washington
Evolutionary Systems Society (WESS) on Information Theory and Molecular
Recognition and discuss his latest article.
author = "Peter P. Papp and Dhruba K. Chattoraj and Thomas D. Schneider",
title = "Information Analysis of Sequences that Bind the Replication
journal = "J. Mol. Biol.",
volume = "233",
number = "2",
pages = "219-230",
year = 1993,
comment = "color logos on the cover!"}
The talking points for Chip Lawrence's discussion:
A Gibbs Sampler for the Detection of Subtle Motifs in Multiple Sequences
A new statistical algorithm is described that aligns sequences using predictive
inference. Using residue frequencies, this Gibbs sampling algorithm
iteratively selects alignments in accordance with their conditional
probabilities. The newly formed alignments in turn update the evolving residue
frequency model. When equilibrium is reached the most probable alignment can
be identified. If a detectable pattern is present, we have found convergence
is rapid. Effectively, the algorithm finds optimal local alignments from
multiple sequences in linear time (seconds on current workstations). Its use
is illustrated on test sets of lipocalins, and prenyltranferases. Some
information theory questions to be addressed:
(1) What is missing information?
(2) What is the incomplete data log likelihood?
(3) How do these relate to R_seq and R_freq?
(4) What is a "Gibbs" Sampler?
The talking points for Tom Schneider's discussion:
What is the role of Shannon information theory in biology? Shannon's basic
concepts, developed in the late 40's, contributed substantially to the
information revolution we are enjoying today. But Crick's "Central Dogma" of
molecular biology as "information flow" from DNA to RNA to protein, while
qualitatively descriptive, has been quantitatively disappointing. Our
objective should be to develop a consistent strategy for the application of
Shannon information to molecular systems. By considering the "before" and
"after" states of genetic systems, the amount of information required for
molecular recognition can be calculated. This can be compared to the
information stored in the nucleic-acids themselves at the genetic control
points. Colorful "sequence logos" elegantly communicate the contribution of
individual bases to recognition processes. Often the two approaches give
similar results, suggesting that the information stored in the DNA is just
sufficient for the control points to be located in the genome. When the two
approaches give different results it provides an opportunity to test whether
information theory is useful for molecular biology. For example, there is
twice as much information at bacteriophage T7 promoters as the RNA polymerase
should need to find the promoters. This suggests that another protein beside
the polymerase binds at the promoters. This prediction is being experimentally
tested by genetic engineering experiments. The generality of the approach
suggests that a wide range of biomedical applications may eventually follow.
As always, these will be free-for-all discussions. Bring reprints of any of
your recent publications.
The Lister Hill Center is on the NIH campus, 9000 Rockville Pike. The Center
is the 10-story building immediately north-west of the Ramada Inn in Bethesda.
To get there from the 495 Beltway, exit at Wisconsin heading South and turn
right at the intersection after the Medical Center Metro stop. The third left
leads to Lister Hill Center, and you have made a mistake if the Center is not
now occupying your entire field of vision.
The meeting will begin at 5:45 pm. The doors of the center close at 6:00 pm;
those arriving late and finding the doors locked should phone 301-496-2475
extension 49, from payphones at the Metro stop if necessary.
National Cancer Institute
Laboratory of Mathematical Biology
Frederick, Maryland 21702-1201
toms at ncifcrf.gov
John L. Spouge
National Center for Biotechnology Information
National Library of Medicine
38A / 8S 806 NLM NIH
Bethesda MD 20894
spouge at ncbi.nlm.nih.gov
John S. Garavelli
Protein Information Resource
National Biomedical Research Foundation
3900 Reservoir Road
Washington DC 20007-2195
garavelli at NBRF.Georgetown.Edu