From owner-info-theory@net.bio.net Wed Feb 01 22:00:00 1995
Newsgroups: bionet.info-theory,news.answers
Path: biosci!hubcap.clemson.edu!gatech!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@ncifcrf.gov (Tom Schneider)
Subject: Biological Information Theory and Chowder Society FAQ
Message-ID: <D38vwo.Iww@ncifcrf.gov>
Followup-To: bionet.info-theory
Summary: monthly Frequently Asked Questions posting for BITCS
 The news group bionet.info-theory is a forum for discussing information theory
 in biology and for tossing food for thought around.  Other interesting
 mathematical problems in biology are also welcome, as we will try our best to
 take the log of them, so as to convert them into information theory problems.
Originator: toms@fcsparc6
Keywords: FAQ, Biological Information Theory and Chowder Society
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
Date: Tue, 31 Jan 1995 00:55:36 GMT
Approved: news-answers-request@MIT.Edu
Lines: 792
Xref: biosci bionet.info-theory:3082 news.answers:31588

Archive-name: biology/info-theory

***********************************************************

Replies to Frequently Asked Questions (FAQ) for bionet.info-theory

             Biological Information Theory and Chowder Society

version = 1.67 of bionet.info-theory.faq  1995 January 18

***********************************************************
- What is the History of The Biological Information Theory and Chowder Society?
- What Kind of Questions Are Appropriate For Discussion?
- How Can I Learn More About Information Theory and Biology?
- How Do I find Sequence Logos on the Web?
- Is There a Shell Script for Making Sequence Logos?
- Is There a Mosaic Page for Making Sequence Logos?
- Will Authors Send Me Papers?
- Can You Just Point My Mosaic To The FAQ and the Archives?
- How Do I obtain bionet.info-theory BY EMAIL?
- Where Did I Get This FAQ File From Originally?
- What is the IP number of the FAQ archive?
- Where Are the Bionet Archives?
- What Can I Do About Inappropriate Postings?
- What is the official word on copyright of this FAQ?
- Who Takes Care of This Group?
***********************************************************

* What is the History of The Biological Information Theory and Chowder Society?

The Biological Information Theory and Chowder Society (BITCS) is a group of
scientists interested in the biological applications of information theory
(thus the "BIT") who meet informally for dinner (thus the "CS") from time to
time in the Washington, DC, area.  At our dinners we have only one rule ---
food fights are discouraged.

The guys who started this thing did it because we weren't certain we understood
the biological implications of information theory.  Some of us are more
comfortable with the mathematical machinery and assemble biological systems
into grand canonical ensembles whether they want to be there or not; and some
of us think they understand what the biological systems are doing but can't
take a log to base 2.  What we try to do is pry from one another the bits of
knowledge that will help us understand what's going on.

Some of the topics up for discussion in our group are:
  biological applications of information theory
  biochemical molecular machines
  computer methods for recognition of molecular structure and function
  database organization for biomolecular information
  nanotechnology
  the limits of computation
  "dissipationless" (?) computation
  Maxwell's demon
  anecdotes and humor about all these topics
A few relevant papers are listed below.

The group started when Tom Schneider was introduced to John Spouge in 1988.
Tom bounced his ideas about molecular machines off John, and John kept finding
flaws.  Tom would go away rather unhappily for a month and then find a
solution.  But John was always one step ahead...  (and still is, on last
account.)  Tom gave a talk about molecular machines at the Lambda Lunch meeting
on the Bethesda NIH campus, and John introduced John (Steve) Garavelli.  We all
got together with Peter Basser for dinner once in a while to talk about
information theory.  Steve brought in one of the first people to apply
information theory to biology, Hubert Yockey.  Steve Garavelli dubbed the group
the "Biological Information Theory and Chowder Society", which it is still
called.  We are known sometimes as 'chowderheads', and talk about food fights,
but so far have only had electronic food fights!  We hold dinners in Bethesda
Maryland on random occasions.

When our informal mailing list became difficult to handle, we petitioned to
start a bionet news group.  We hope to hold roaring discussions, and everyone
is welcome to join.  If you are uncertain about something, quit lurking and ask
on the net.  It may well be that what bothered you is the key to a new piece of
information theory in biology.  (The major advances so far have been by things
that REALLY bugged people.)

We will also announce when and where our (irregular) eatings are and you are
welcome to join if the travel is not too far.  John Spouge
(spouge@ncbi.nlm.nih.gov), usually makes the arrangements.

***********************************************************

* What Kind of Questions Are Appropriate For Discussion?

This faq sheet answers simple questions about this group.  The BIG questions
should be discussed on the net, where we can all haggle over them.  Here are a
few for starters:

What is the role of theory in biology today?
What should be the role of biological theory?

What is information?  How should it be defined?

What bothers you when you read the two papers on the theory of molecular
machines?  (It is only from the things that bother us that we can make progress
in understanding.)  (See references below.)

What are flaws in the theory of molecular machines?

How is ATP used to drive molecular machines?

All communication systems are associated with living things, so is it true that
information theory is really a theory about living things?  Was Shannon really
a great biologist?

What does Maxwell's Demon have to do with all of this?

What are the limits of computers?

What are the limits of nanotechnology?

***********************************************************

* How Can I Learn More About Information Theory and Biology?

REFERENCES - General

There are a huge number of papers related to this topic, just about everything
in molecular biology, lots of chemistry, physics, electronics, evolutionary
theory, thermodynamics, statistical mechanics and the kitchen sink ...  You can
get a pretty good overview by combining the references of Schneider.ccmm,
Schneider.edmm and Leff1990.  References are given in BiBTeX format, the
bibliography program associated with LaTeX, the powerful and portable
typesetting program.

By arrangement, books that have prices listed can be ordered over Internet from:
  Reiter's Scientific & Professional Books
  2021 K Street, NW
  Washington, DC  20006
  1-800-537-4314
  1-202-223-3327
  1-202-296-9103 FAX
  books@reiters.com

Shipping and handling charges are:
in the DC metropolitan area $4.00 for one item, $0.50 for each additional item,
outside the area $4.50 for one item, $0.50 for each additional item.

The prices are current as of October 1994; because publishers are constantly
changing their prices, they should be considered estimates rather than
guaranteed prices.  To open an account you must first either phone or FAX them
and provide a credit card number.  Book orders can be then placed at any time
over the Internet.
        **DO NOT SEND CREDIT CARD NUMBERS OVER THE INTERNET!**

Reiter's carries all of the books on this list except "Information Theory:
Saving Bits", and that one can be special ordered.  If enough interest in this
book is generated by the FAQ, it will be added as regular stock.  (It can also
be ordered directly from the company using the information given.)

# Gonick's Wonderful books (Don't be shy!  They are worth the money!!):

@book{Gonick.computers,
author = "L. Gonick",
title = "The Cartoon Guide to Computers",
edition = "second",
publisher = "HarperCollins",
address = "New York, NY",
isbn = "0-06-273097-5",
price = "price as of 1994 October 31: \$11.00",
year = "1991"}

@book{Gonick.genetics,
author = "L. Gonick",
title = "The Cartoon Guide to Genetics",
edition = "updated",
publisher = "Barnes \& Nobel",
address = "New York, NY",
isbn = "0-06-273099-1",
price = "price as of 1994 October 31: \$12.00",
year = "1991"}

@book{Gonick.physics,
author = "L. Gonick
 and A. Huffman",
title = "The Cartoon Guide to Physics",
publisher = "HarperPerennial",
address = "New York, NY",
isbn = "0-06-273100-9",
price = "price as of 1994 October 31: \$12.00",
year = "1990"}

# A good starting point if you don't know much molecular biology:
# (Two volumes)

@book{Watson1987,
author = "J. D. Watson
 and N. H. Hopkins
 and J. W. Roberts
 and J. A. Steitz
 and A. M. Weiner",
title = "Molecular Biology of the Gene",
edition = "fourth",
publisher = "The Benjamin/Cummings Publishing Co., Inc.",
address = "Menlo Park, California",
isbn = "0-8053-9614-4",
price = "price as of 1994 October 31: \$59.95",
year = "1987"}

# This book describes LaTex and BiBTeX:

@book{Lamport1994,
author = "L. Lamport",
title = "\LaTeX: A Document Preparation System,
User's Guide \& Reference Manual",
edition = "second",
publisher = "Addison-Wesley Publishing Company",
address = "Reading, Massachusetts",
isbn = "0-201-52983-1",
price = "price as of 1994 October 31: \$32.95",
year = "1994"}

# ***********************************************************
# REFERENCES - Information Theory

# The best starter book:

@book{Pierce1980,
author = "J. R. Pierce",
title = "An Introduction to Information Theory:
Symbols, Signals and Noise",
edition = "second",
publisher = "Dover Publications, Inc.",
address = "New York",
isbn = "0-486-24061-4",
comment = "
original copyright 1961
Ordering information:  Pierce1980 is currently available by mail from:
   Dover Publications, Inc.
   31 East 2nd street
   Mineola, New York 11501
order:
   Pierce, An Introduction to Information Theory: Symbols, Signals and Noise
   code number: 24061-4
$7.95 + charges.  Payment in full, no telephone or credit card orders.
Postage and Handling charges are:
Bookrate: $3 (US only)
UPS: $4.50 (US only, not Alaska or Hawaii or PO boxes)
Foreign orders: add 20% of total (minimum $2.50)
Sales Tax (Ny residents only)
Foreign Orders Note: Remittances must be sent by international money order or
in U.S. funds via Federal Wire System to Chemical Bank, N. Y.  ABA #021000128.
Mark all remittances `For the account of Dover Publications, Inc.  #001 053
272'.  This information is from the Dover Math and Science Catalogue 9/92",
price = "price as of 1994 October 31: \$8.95",
year = "1980"}

Christopher Hillman (hillman@math.washington.edu) suggests that this one is a
better starting point:  Thomas Cover and Joy A. Thomas, Elements of Information
Theory, Wiley, 1991.  People who have seen both could post their opinions.

# A good introduction to the mathematics:

@book{Sacco1988,
author = "W. Sacco
 and W. Copes
 and C. Sloyer
 and R. Stark",
title = "Information Theory: Saving Bits",
publisher = "Janson Publications, Inc.",
comment = "original address was Providence, Rhode Island",
address = "Dedham, MA",
isbn = "0-939765-25-X",
phone = "(800) 322-6284",
price = "price as of 1994 October 31: \$11.95",
year = "1988"}

# Important originals:

@article{Shannon1948,
author = "C. E. Shannon",
title = "A Mathematical Theory of Communication",
year = "1948",
journal = "Bell System Tech. J.",
volume = "27",
pages = "379-423, 623-656"}

@book{ShannonWeaver1949,
author = "C. E. Shannon
 and W. Weaver",
title = "The Mathematical Theory of Communication",
publisher = "University of Illinois Press",
address = "Urbana",
isbn = "0-252-72548-4",
price = "price as of 1994 October 31: \$9.95",
year = "1949"}

@article{Shannon1949,
author = "C. E. Shannon",
title = "Communication in the Presence of Noise",
year = "1949",
journal = "Proc. IRE",
volume = "37",
pages = "10-21"}

# For the committed: The Complete Works!

@inproceedings{Shannon1993,
author = "C. E. Shannon",
editor = "N. J. A. Sloane and A. D. Wyner",
booktitle = "Claude Elwood Shannon: Collected Papers",
publisher = "IEEE Press",
address = "Piscataway, NJ",
isbn = "0-7803-0434-9",
comment = "IEEE Order Number: PC0331-9
  To order directly by charge card (eg Visa works) you can call
  (908)-981-0060
  $69.95 + $5 handling charge
  delivery in about 2 weeks",
price = "price as of 1994 October 31: \$69.95",
year = "1993"}

# How locks work and other cool stuff:

@book{Macaulay1988,
author = "D. Macaulay",
title = "The Way Things Work",
publisher = "Houghton Mifflin Company",
address = "Boston",
isbn = "0-395-42857-2",
price = "price as of 1994 October 31: \$29.95",
comment = "This book is also available on Windows-Compatible CD-ROM
  cdrom isbn = 1-56458-901-3  Price as of 1994 October 31: \$99.95",
year = "1988"}

# Leff1990 gives a review of the Maxwell's Demon problem.
# See also Schneider.edmm, listed below.

@book{Leff1990,
author = "H. S. Leff and A. F. Rex",
title = "Maxwell's Demon: Entropy, Information, Computing",
publisher = "Princeton University Press",
address = "Princeton, N. J.",
phone = "1(800) 777-4726",
isbn.hard = "0-691-08726-1 (hard cover)",
price.hard = "price as of 1994 October 31: \$80.00",
isbn.paper = "0-691-08727-X (paperback)",
price.paper = "price as of 1994 October 31: \$26.95",
year = "1990"}

# ***********************************************************

# REFERENCES - Jaynes

@article{JaynesI,
  author = "Edwin T. Jaynes",
  title = "Information Theory and Statistical Mechanics",
  year = 1957,
  journal = "Physical Review",
  volume = "106",
  pages = "620-630"}

@article{JaynesII,
  author = "Edwin T. Jaynes",
  title = "Information Theory and Statistical Mechanics. {II}",
  year = 1957,
  journal = "Physical Review",
  volume = "108",
  pages = "171-190"}

# A version of Jaynes' new book "PROBABILITY THEORY -- THE LOGIC OF SCIENCE"
# is available on the net.  See:
#
# ftp://bayes.wustl.edu/Jaynes.book/
# Larry Bretthorst (larry@bayes.wustl.edu)
#
# http://omega.albany.edu:8008/JaynesBook.html
# Carlos Rodriguez (carlos@math.albany.edu)
#
# Tom Schneider's pointers to these places:
# http://www.ncifcrf.gov:2001/~toms/Jaynes.html
#
# Note:  The book is being written now and new versions come out every once in a
# while.  One of these locations may be more up to date than the other.

# ***********************************************************
# REFERENCES - Schneider

@article{Schneider1986,
author = "T. D. Schneider
 and G. D. Stormo
 and L. Gold
 and A. Ehrenfeucht",
title = "Information content of binding sites on nucleotide sequences",
journal = "J. Mol. Biol.",
volume = "188",
pages = "415-431",
year = "1986"}

@inproceedings{Schneider1988,
author = "T. D. Schneider",
editor = "G. J. Erickson and C. R. Smith",
title = "Information and entropy of patterns in genetic switches",
booktitle = "Maximum-Entropy and Bayesian Methods in Science and Engineering",
volume = "2",
pages = "147-154",
publisher = "Kluwer Academic Publishers",
address = "Dordrecht, The Netherlands",
year = "1988"}

@article{Schneider1989,
author = "T. D. Schneider
 and G. D. Stormo",
title = "Excess Information at Bacteriophage {T7} Genomic Promoters
Detected by a Random Cloning Technique",
year = "1989",
journal = "Nucl. Acids Res.",
volume = "17",
pages = "659-674"}

@article{Schneider.Stephens.Logo,
author = "T. D. Schneider
 and R. M. Stephens",
title = "Sequence Logos: A New Way to Display Consensus Sequences",
journal = "Nucl. Acids Res.",
volume = "18",
pages = "6097-6100",
year = "1990"}

@article{Schneider.ccmm,
author = "T. D. Schneider",
title = "Theory of Molecular Machines.
{I. Channel} Capacity of Molecular Machines",
journal = "J. Theor. Biol.",
volume = "148",
number = "1",
pages = "83-123",
note = "{(Note: The figures were printed out of order!
Fig. 1 is on p. 97.)}",
year = 1991}

@article{Schneider.edmm,
author = "T. D. Schneider",
title = "Theory of Molecular Machines.
{II. Energy} Dissipation from Molecular Machines",
journal = "J. Theor. Biol.",
volume = "148",
number = "1",
pages = "125-137",
year = 1991}

@article{Herman.Schneider1992,
author = "N. D. Herman
  and T. D. Schneider",
title = "High Information Conservation Implies that at Least Three Proteins
Bind Independently to {F} Plasmid {{\em incD\/}} Repeats",
journal = "J. Bact.",
volume = "174",
pages = "3558-3560",
year = "1992"}

@article{Stephens.Schneider.Splice,
author = "R. M. Stephens
  and T. D. Schneider",
title = "Features of spliceosome evolution and function
inferred from an analysis of the information at human splice sites",
journal = "J. Mol. Biol.",
volume = "228",
pages = "1124-1136",
year = "1992"}

@article{Papp.helixrepa,
author = "P. P. Papp
 and D. K. Chattoraj
 and T. D. Schneider",
title = "Information Analysis of Sequences that Bind
the Replication Initiator {RepA}",
journal = "J. Mol. Biol.",
comment = "Cover of 233, number 2!",
volume = "233",
pages = "219-230",
year = "1993"}

@article{Schneider.nano2,
author = "T. D. Schneider",
title = "Sequence Logos, Machine/Channel Capacity,
{Maxwell}'s Demon, and Molecular Computers:
a Review of the Theory of Molecular Machines",
journal = "Nanotechnology",
volume = "5",
number = "1",
pages = "1-18",
year = "1994"}
ftp://ftp.ncifcrf.gov/pub/delila/nano2.ps

# ***********************************************************
# REFERENCES - Yockey

@book{Yockey1958a,
editor = "Hubert P. Yockey and Robert P. Platzman and Henry Quastler",
title = "Symposium on Information Theory in Biology",
booktitle = "Symposium on Information Theory in Biology",
publisher = "Pergamon Press",
address = "New York, London",
comment = "out of print",
year = "1958"}

@article{Yockey1981,
author = "Hubert P. Yockey",
year = 1981,
title = "Self-organization Origin of Life Scenarios and Information Theory",
journal = "J. Theor. Biol.",
volume = "91",
pages = "13-31"}

@book{Yockey1992,
author = "H. P. Yockey",
title = "Information Theory in Molecular Biology",
publisher = "Cambridge University Press",
address = "Cambridge",
isbn = "0-521-35005-0",
comment = "40 West 20th Street,
New York, N. Y.  10011-4211,
order number 350050",
phone = "1-800-827-7423",
price = "price as of 1994 October 31: \$74.95",
year = "1992"}

Following is Hubert Yockey's reference list:

Yockey, Hubert P. Information Theory and Molecular Biology, Cambridge UK:
Cambridge University Press (1992)
When is random random? Nature 344 (1990) p823, Hubert P. Yockey
Yockey, Hubert P. (1981). Self-organization origin of life scenarios and
information theory. Journal of Theoretical Biology, 91, 13-31.
Yockey, Hubert P. (1979). Do overlapping genes violoate molecular biology and
the theory of evolution? Journal of Theoretical Biology, 80, 21-26.
Yockey, Hubert P. (1978). Can the Central Dogma be derived from information
theory? Journal of Theoretical Biology, 74, 149-152.
Yockey, Hubert P. (1977a). A prescription which predicts functionally
equivalent residues at given sites in protein sequences. 67, 337-343.
Yockey, Hubert P. (1977b). On the information content of cytochrome c.
Journal of Theoretical Biology, 67, 345-376.
 Yockey, Hubert P. (1977c). A calculation of the probability of spontaneous
biogenesis by information theory. Journal of Theoretical Biology, 67,
377-398.
Yockey, Hubert P (1974). An application of information theory to the Central
Dogma and the sequence hypothesis. Journal of Theoretical Biology,.46,
369-406.
Yockey, Hubert P. (1960) The Use of Information Theory in Aging and Radiation
Damage In The Biology of Aging American Institute of Biological Sciences
Symposium No. 6 (160) pp338-347
Yockey, Hubert P., Platzman, Robert P. & Quastler, Henry, eds. (1958a).
Symposium on Information Theory in Biology, New York, London: Pergamon Press.
Yockey, Hubert P. (1958b). A study of aging, thermal killing and radiation
damage by information theory. In Symposium on Information Theory in Biology.
eds. Hubert P. Yockey, Robert Platzman & Henry Quastler, pp297-316. New York,
London: Pergamon Press.
Yockey, Hubert P. (1956). An application of information theory to the physics
of tissue damage. Radiation.Research, 5, 146-155.

***********************************************************

* Will Authors Send Me Papers?

Tom Schneider will mail you copies of his papers.  Send your physical address
to him at toms@ncifcrf.gov.  Some papers are on line already, see also the
README file in the ftp archive ftp.ncifcrf.gov in pub/delila.

If you are willing to send out papers or have papers you would like listed
here, please contact Tom Schneider.

You can request them by Mosaic from the page
http://www.ncifcrf.gov:2001/~toms/papers.html

***********************************************************

* How Do I find Sequence Logos on the Web?

http://www.ncifcrf.gov:2001/~toms/sequencelogo.html

***********************************************************

* Is There a Shell Script for Making Logos?

Yes, you will find the one Shmuel Pietrokovski wrote in the ftp archive
ftp.ncifcrf.gov in pub/delila/logoaid.  (Also available in
bioinformatics.weizmann.ac.il/pub/software/logoaid.)
***********************************************************

* Is There a Mosaic Page for Making Sequence Logos?

Yes, Steve Brenner has done it!

http://www.bio.cam.ac.uk/seqlogo/

***********************************************************

* Can You Just Point My Mosaic To The FAQ and the Archives?

This file and the postings may be obtained by Mosaic through the world wide web
at:

<UL>

<H2>
<LI> <A HREF="ftp://ftp.ncifcrf.gov/pub/delila/bionet.info-theory.faq.Z">
FAQ (Frequently Asked Questions)</A>
about the Biological Information Theory and Chowder Society
</H2>

<H2>
<LI> <A HREF="gopher://net.bio.net/11/BIO-INFO">
Gopher Link to Archive of All Postings.</A>
This archive contains the most recent postings
as separate documents.</H2>

<H2>
<LI> <A HREF="ftp://net.bio.net/pub/BIOSCI/BIO-INFO">
Archive of Monthly Postings.</A>
This archive (currently) contains postings
from each month as a single document.</H2>

<H2>
<LI> <A HREF="ftp://ftp.bio.indiana.edu/usenet/bionet/info-theory/">
Gopher link to Archive of Postings at IUBO.</A>
This archive contains individual postings.
Older postings are collected by the month as a single document.
There is an index for each month.
</H2>

</UL>

***********************************************************

* How Do I obtain bionet.info-theory BY EMAIL?

If you have access to USENET news YOU DO NOT NEED AN E-MAIL SUBSCRIPTION!!  We
strongly encourage all interested users to explore getting USENET news at your
site.  It's MUCH easier on you than an e-mail subscription!  Please consult
your systems manager or contact biosci-help@net.bio.net for assistance if
needed.

The BIOSCI (email) name for the forum is BIO-INFO.

Depending on where you are, you have to do different things to subscribe or be
removed from the email subscription list:

SUBSCRIBING / UNSUBSCRIBING
   North or South America or Pacific Rim:
     Using the computer account in which you want to receive mail
     messages, please send an email message to the e-mail server at
     biosci-server@net.bio.net.  Leave the Subject: line blank.
     In the body of the message include the line

     subscribe bio-info

     to add yourself to the mailing list or

     unsubscribe bio-info

     to cancel an existing subscription.  If you need personal
     subscription assistance, please contact biosci-help@net.bio.net.

   Europe, Africa, and Central Asia:
	   Send a email message to the person at
      biosci@daresbury.ac.uk
   requesting a subscription or removal from the BIO-INFO forum.

SENDING OUT POSTINGS
Thereafter, address email messages for this forum to one of:

   North or South America or Pacific Rim:
      bio-info@net.bio.net

   Europe, Africa, and Central Asia:
      bio-info@daresbury.ac.uk

You can post to either of the above address if you want.  We only request that
you sign up at your local node in order to optimize the use of the network
resources for message distribution.

Do not send subscription requests to any of these addresses, or you will have
sent it to everybody on the planet (to your great embarrassment, and we will
drub you with food cake)!  Let me say that again:  please do not post requests
for subscription or being removed from the list to the list itself, that takes
up bandwidth all over the world!

If you have problems, contact the subscription site manager who you signed up
with.  If your problem is not resolved, please contact
biosci-help@net.bio.net.

DO NOT CONTACT TOM SCHNEIDER FOR SUBSCRIPTIONS OR UNSUBSCRIBING!

***********************************************************

* Where Did I Get This FAQ File From Originally?

The latest version of this FAQ is stored in the anonymous ftp archive
ftp.ncifcrf.gov in pub/delila under the name bionet.info-theory.faq and also as
bionet.info-theory.faq.Z (The .Z means it is compressed; remember to use the
binary transfer mode if you pick up the latter.  See the uncompressed README
file in the archive for where to get the uncompress program if you need it.)
Please send questions and comments to: Tom Schneider toms@ncifcrf.gov

This file is also posted monthly on news.answers and bionet.info-theory.

***********************************************************

* What is the IP number of the FAQ archive?

For ftp.ncifcrf.gov you can use 129.43.1.11

***********************************************************
* Where Are the Bionet Archives?

The entire collection of BIOSCI/bionet messages from inception are
available via the biosci.src WAIS source at net.bio.net.  Contact
biosci-help@net.bio.net for further help with accessing this WAIS source.

FTP archives of all the BIOSCI/bionet messages are available at net.bio.net
[134.172.2.69] in /pub/BIOSCI.  bionet.info-theory is in pub/BIOSCI/BIO-INFO.
Files are in mailbox format, with names of the form YYMM (YY=last 2 digits of
the year, MM=cardinal number of the month, zero padded).  The current months
postings are in the file 'current'.  Contact biosci-help@net.bio.net for
further help with or comments on the archives.

All the bionet.* newsgroups, including info-theory, are also archived for
Gopher retrieval from the IUBIO Gopher hole and for anonymous ftp from
ftp.bio.indiana.edu, directory usenet/bionet/...

The archives can be accessed by gopher or ftp running under the Mosaic
interface.  The URL (Universal Record Locator) for gopher is:
   gopher://net.bio.net/11/BIO-INFO
This gives individual postings.  By ftp one can use:
   ftp://net.bio.net/pub/BIOSCI/BIO-INFO
Unfortunately this gives the entire month in a single document.

***********************************************************

* What Can I Do About Inappropriate Postings?

The short form of this news group's name, bio-info, can be a little confusing
to some people inexperienced in network communications or with little knowledge
of the discipline (if there is any :-) of biological information theory.  It
can and has been mistaken as a news group for general biological information.
Our readers should be aware that when such postings come to our attention, we
do attempt to inform, privately, the people who make these inappropriate
postings of the error of their ways and suggest alternative or more appropriate
venues.

Subjecting the writers of inappropriate posting to public excoriation is not a
good policy because the mistake is usually inadvertent and the follow-up
postings add further to the irritation of our regular readers.  When others
publicly reply to such posts in this news group, although they may think they
are being polite to the original poster, they are still annoying our regular
readers.  We suggest that a better policy for readers who do wish to reply to
inappropriate posts is to do so privately or to an appropriate news group.

***********************************************************

* What is the official word on copyright of this FAQ?

This FAQ fits the description in the U. S. Copyright Act of a "United States
Government work".  It was written as a part of my official duties as Government
employee.  This means it cannot be copyrighted.  The article is freely
available without a copyright notice, and there are no restrictions on its use,
now or subsequently.  I retain no rights in the FAQ.

Thomas D. Schneider

***********************************************************

* Who Takes Care of This Group?

John S. Garavelli
Protein Information Resource
National Biomedical Research Foundation
Washington, DC  20007
garavelli@gunbrf.bitnet

Tom Schneider
National Cancer Institute
Laboratory of Mathematical Biology
Frederick, Maryland  21702-1201
toms@ncifcrf.gov

John L. Spouge
National Center for Biotechnology Information
National Library of Medicine
Bethesda, MD  20894
spouge@frodo.nlm.nih.gov

Please email comments and suggestions on this faq sheet to Tom.

John Garavelli (who also answers to "Steve" if you want to avoid confusion)
often organizes dinner speakers.

John Spouge often arranges dinner locations.

***********************************************************

From owner-info-theory@net.bio.net Thu Feb 02 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Message-ID: <D3G6D1.BBL@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3glqca$pof@mserv1.dl.ac.uk> <Pine.SUN.3.90.950131163822.12087B-100000@snfma1.if.usp.br>
Date: Fri, 3 Feb 1995 23:24:37 GMT
Lines: 28

In article <Pine.SUN.3.90.950131163822.12087B-100000@snfma1.if.usp.br>
szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:

| 	Now, my feeling is that evolution is related to increase in 
| control processes wich can be seen as machines responsible to keep the 
| entropy low in expense of energy. Does my point of view is of some 
| interest to our discussion ? 

Could you expand on that?

| 	I would like to note that Ludwig Boltzmann wrote quite a lot 
| about the relationship betwen life and thermodynamics and in particular 
| to entropy. I had no chance to read his papers but I suppose that they 
| may be valuable to this discussion.  

I wasn't aware of his writings on this topic; perhaps you could find them for
us?

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Fri Feb 03 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 4 Feb 1995 15:58:29 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 71
Distribution: bionet
Message-ID: <3h0875$ek1@hamilton.maths.tcd.ie>
References: <3glqca$pof@mserv1.dl.ac.uk> <3gmgir$ncd@hamilton.maths.tcd.ie> <D3G7B0.Bp1@ncifcrf.gov>
NNTP-Posting-Host: hamilton.maths.tcd.ie

toms@fcsparc6.ncifcrf.gov (Tom Schneider) writes:

>| but it seems to me perfectly sensible to compare the Chaitin entropy
>| (or informational content) of DNA -- regarded simply as a string --
>| in higher and lower organisms, and argue that the difference 
>| does reflect a difference in the "complexity" of the organisms.

>That line of thought leads to confusion.  How about trying this.  Below are the
>DNA sequences to which the lambda cro protein binds:

>   1 tgcgtcctgctgatgtgctcagtatcaccgccagtggtatttatgtcaacaccgccagaga
...
>  12 cctccttagtacatgcaaccattatcaccgccagaggtaaaatagtcaacacgcacggtgt
...
>1.  Approximately how many bits are used to describe the sequences above for
>purposes of data transmission or storage?

I haven't the slightest idea.
It is of the nature of algorithmic entropy that one cannot compute H(s)
(in general) -- one can only give upper bounds to it.

>2.  What does the H function look for each column?  That is, if l is the
>position along the site, from -30 to 30, and b is a base, then you can
>calculate the frequency of each base at each position, f(b,l).  Ignoring small
>sample problems, what does the curve H(l) = - sum_b f(b,l) log2 f(b,l) look
>like?  In particular, what are the values at positions +7 and +15?

This is a misunderstanding of the definition of H(s).
It is not defined by frequency.
(Certainly the frequencies you quote will give an upper bound to H(s),
but that is all.)

>3.  How can you account for the observation that H is larger on the outside of
>the site if H is information?  The protein does not contact the DNA over a region
>much more than -10 to +10.

I don't know what "H is larger on the outside" means.
H(s) is defined for the whole string s.

>4.  What is the amount of information that cro sees in this sequence set?

It is perfectly possible that H(s) does not coincide
with your idea of "informational content".
Just as "potential energy" may not coincide with one's idea of "energy".
It doesn't really matter -- use another word if you prefer.

My question is simply whether H(s) where s is the DNA string
could make a sensible definition of the "complexity" of a creature.

>If you work through this example, I am sure it will illuminate your
>understanding of H (uncertainty) and information!

I'm just using H(s) to mean 
the algorithmic (Chaitin) entropy of the string s:
the length of the shortest program p which will output s
when fed into a chosen universal Turing machine U:

H(s) = min {|p|: U(p) = s}

I haven't said anything about uncertainty or information
(except to mention that "informational content" 
is an alternative name used by Chaitin for the measure H(s) -- 
ignore this if you feel it causes confusion).
If you think use of the letter 'H' causes confusion, use another letter.


-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Fri Feb 03 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!bcm!cs.utexas.edu!swrinde!emory!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Message-ID: <D3G7B0.Bp1@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3glqca$pof@mserv1.dl.ac.uk> <3gmgir$ncd@hamilton.maths.tcd.ie>
Distribution: bionet
Date: Fri, 3 Feb 1995 23:45:00 GMT
Lines: 61

In article <3gmgir$ncd@hamilton.maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy)
writes:

| but it seems to me perfectly sensible to compare the Chaitin entropy
| (or informational content) of DNA -- regarded simply as a string --
| in higher and lower organisms, and argue that the difference 
| does reflect a difference in the "complexity" of the organisms.

That line of thought leads to confusion.  How about trying this.  Below are the
DNA sequences to which the lambda cro protein binds:

     ---------------------                   +++++++++++++++++++++
     322222222221111111111--------- +++++++++111111111122222222223
     0987654321098765432109876543210123456789012345678901234567890
     .............................................................
   1 tgcgtcctgctgatgtgctcagtatcaccgccagtggtatttatgtcaacaccgccagaga
   2 tctctggcggtgttgacataaataccactggcggtgatactgagcacatcagcaggacgca
   3 tcaccgccagtggtatttatgtcaacaccgccagagataatttatcaccgcagatggttat
   4 ataaccatctgcggtgataaattatctctggcggtgttgacataaataccactggcggtga
   5 gtcaacaccgccagagataatttatcaccgcagatggttatctgtatgttttttatatgaa
   6 ttcatataaaaaacatacagataaccatctgcggtgataaattatctctggcggtgttgac
   7 ttttgtgctcatacgttaaatctatcaccgcaagggataaatatctaacaccgtgcgtgtt
   8 aacacgcacggtgttagatatttatcccttgcggtgatagatttaacgtatgagcacaaaa
   9 atcaccgcaagggataaatatctaacaccgtgcgtgttgactattttacctctggcggtga
  10 tcaccgccagaggtaaaatagtcaacacgcacggtgttagatatttatcccttgcggtgat
  11 acaccgtgcgtgttgactattttacctctggcggtgataatggttgcatgtactaaggagg
  12 cctccttagtacatgcaaccattatcaccgccagaggtaaaatagtcaacacgcacggtgt

The numbers on the top are a "numbar" - read them vertically.  The
sequences run from -30 to +30.

Here are some questions:

1.  Approximately how many bits are used to describe the sequences above for
purposes of data transmission or storage?

2.  What does the H function look for each column?  That is, if l is the
position along the site, from -30 to 30, and b is a base, then you can
calculate the frequency of each base at each position, f(b,l).  Ignoring small
sample problems, what does the curve H(l) = - sum_b f(b,l) log2 f(b,l) look
like?  In particular, what are the values at positions +7 and +15?

3.  How can you account for the observation that H is larger on the outside of
the site if H is information?  The protein does not contact the DNA over a region
much more than -10 to +10.

4.  What is the amount of information that cro sees in this sequence set?

If you work through this example, I am sure it will illuminate your
understanding of H (uncertainty) and information!

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Sun Feb 05 22:00:00 1995
Path: biosci!rutgers!gatech!concert!bigblue.oit.unc.edu!mmlds1.pha.unc.edu!iiv
From: Iosif Vaisman <iiv@mmlds1.pha.unc.edu>
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: Mon, 6 Feb 1995 16:14:27 -0500
Organization: The University of North Carolina at Chapel Hill
Lines: 105
Message-ID: <Pine.ULT.3.91.950206144824.211B-100000@mmlds1.pha.unc.edu>
References: <3glqca$pof@mserv1.dl.ac.uk> <Pine.SUN.3.90.950131163822.12087B-100000@snfma1.if.usp.br> <D3G6D1.BBL@ncifcrf.gov>
NNTP-Posting-Host: mmlds1.pha.unc.edu
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <D3G6D1.BBL@ncifcrf.gov> 

On Fri, 3 Feb 1995, Tom Schneider wrote:

> In article <Pine.SUN.3.90.950131163822.12087B-100000@snfma1.if.usp.br>
> szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:
> 
> | 	I would like to note that Ludwig Boltzmann wrote quite a lot 
> | about the relationship betwen life and thermodynamics and in particular 
> | to entropy. I had no chance to read his papers but I suppose that they 
> | may be valuable to this discussion.  
> 
> I wasn't aware of his writings on this topic; perhaps you could find them for
> us?
> 

"...if you will ask me whether our century will be refered to as a steel 
age or as a steam and electricity age I will answer without second thought 
that it will be called the Darwin Age." -Ludwig Boltzmann

Boltzmann's views on life and evolution could be found in many of his papers,
e.g. Der zweite Hauptsatz der mechanischen Wormetheorie (1886), Ober die Frage 
nach der objektiven Existenz der Voregdnge in der unbelebten Natur (1897).

Sources:

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Populare Schriften / Ludwig Boltzmann ; eingel. u. ausgew.
                   von Engelbert Broda.
     PUBLICATION: Braunschweig ; Wiesbaden : Vieweg, 1979.
     DESCRIPTION: 290 p. ; 21 cm.
           NOTES: Reprint of the 1905 ed. published by J. A. Barth, Leipzig.

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Principien der Naturfilosofi = Lectures on natural
                   philosophy, 1903-1906 / Ludwig Boltzmann ; I.M. Fasol-
                   Boltzmann, ed. ; with an essay by S.G. Brush and G.Fasol.
     PUBLICATION: Berlin ; New York : Springer-Verlag, 1990.
           NOTES: Prefatory matter in German and English.


English translations of Boltzmann's papers:

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Theoretical physics and philosophical problems : selected
                   writings / Ludwig Boltzmann ; edited by Brian McGuinness ;
                   with a foreword by S. R. de Groot ; [translations from the
                   German by Paul Foulkes].
     PUBLICATION: Dordrecht ; Boston : Reidel Pub. Co., c1974.
     DESCRIPTION: xvi, 280 p. : ill. ; 23 cm.
          SERIES: Vienna circle collection ; v. 5

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Ludwig Boltzmann : his later life and philosophy, 1900-1906:
                   book one: a documentary history / edited by John Blackmore.
     PUBLICATION: Dordrecht ; Boston : Kluwer Academic Publishers, 1995-
          SERIES: Boston studies in the philosophy of science ; v. 168
  

Discussions on Boltzmann and evolution:

Yehuda Elkana, Boltzmann's Scientific Research Program and Its Alternatives.
In:        TITLE: The Interaction between science and philosophy. Edited by
                   Y. Elkana.
     PUBLICATION: Atlantic Highlands, N.J., Humanities Press [1974]
     DESCRIPTION: xvii, 481 p. 24 cm.
          SERIES: The Van Leer Jerusalem Foundation series
           NOTES: Papers from a symposium sponsored by the Israel Academy of
                   Sciences and Humanities, the Hebrew University of
                   Jerusalem, and the Van Leer Foundation for the Advancement
                   of Human Culture, held in Jerusalem in Jan. 1971 to honor
                   Professor Samuel Sambursky on his 70th birthday.

          AUTHOR: Brush, Stephen G.
           TITLE: The kind of motion we call heat : a history of the kinetic
                   theory of gases in the 19th century / Stephen G. Brush.
     PUBLICATION: Amsterdam ; New York : North-Holland Pub. Co. ;  New York :
                   Elsevier, 1976.
     DESCRIPTION: 2 v. (769, xxxix p.) ; 24 cm.
          SERIES: Studies in statistical mechanics ; v. 6

           TITLE: Ich bin, also denke ich : die evolutiondre
                   Erkenntnistheorie / Franz Kreuzer im Gesprdch mit
                   Engelbert Broda und Rupert Riedl.
     PUBLICATION: Wien : Deuticke, 1981.
     DESCRIPTION: 123 p. ; 21 cm.
           NOTES: "Aus Anlass des 75. Todestages von Ludwig Boltzmann"--Cover.

          AUTHOR: Prigogine, Ilya and Stengers, Isabelle 
           TITLE: Order out of chaos : man's new dialogue with nature / Ilya
                   Prigogine and Isabelle Stengers ; foreword by Alvin
                   Toffler.
     PUBLICATION: Toronto ; New York, N.Y. : Bantam Books, 1984.
     DESCRIPTION: xxxi, 349 p. : ill. ; 23 cm.
           NOTES: Based on the authors' La nouvelle alliance.

          AUTHOR: Prigogine, Ilya
           TITLE: From being to becoming : time and complexity in the
                   physical sciences / Ilya Prigogine.
     PUBLICATION: San Francisco : W. H. Freeman, c1980.
     DESCRIPTION: xix, 272 p. : ill. ; 24 cm.
           NOTES: Bibliography: p. [257]-262.



Iosif Vaisman
UNC-Chapel Hill

From owner-info-theory@net.bio.net Mon Feb 06 22:00:00 1995
Path: biosci!galaxy.ucr.edu!ihnp4.ucsd.edu!swrinde!gatech!concert!bigblue.oit.unc.edu!mmlds1.pha.unc.edu!iiv
From: Iosif Vaisman <iiv@mmlds1.pha.unc.edu>
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: Mon, 6 Feb 1995 16:14:27 -0500
Organization: The University of North Carolina at Chapel Hill
Lines: 105
Message-ID: <Pine.ULT.3.91.950206144824.211B-100000@mmlds1.pha.unc.edu>
References: <3glqca$pof@mserv1.dl.ac.uk> <Pine.SUN.3.90.950131163822.12087B-100000@snfma1.if.usp.br> <D3G6D1.BBL@ncifcrf.gov>
NNTP-Posting-Host: mmlds1.pha.unc.edu
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <D3G6D1.BBL@ncifcrf.gov> 

On Fri, 3 Feb 1995, Tom Schneider wrote:

> In article <Pine.SUN.3.90.950131163822.12087B-100000@snfma1.if.usp.br>
> szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:
> 
> | 	I would like to note that Ludwig Boltzmann wrote quite a lot 
> | about the relationship betwen life and thermodynamics and in particular 
> | to entropy. I had no chance to read his papers but I suppose that they 
> | may be valuable to this discussion.  
> 
> I wasn't aware of his writings on this topic; perhaps you could find them for
> us?
> 

"...if you will ask me whether our century will be refered to as a steel 
age or as a steam and electricity age I will answer without second thought 
that it will be called the Darwin Age." -Ludwig Boltzmann

Boltzmann's views on life and evolution could be found in many of his papers,
e.g. Der zweite Hauptsatz der mechanischen Wormetheorie (1886), Ober die Frage 
nach der objektiven Existenz der Voregdnge in der unbelebten Natur (1897).

Sources:

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Populare Schriften / Ludwig Boltzmann ; eingel. u. ausgew.
                   von Engelbert Broda.
     PUBLICATION: Braunschweig ; Wiesbaden : Vieweg, 1979.
     DESCRIPTION: 290 p. ; 21 cm.
           NOTES: Reprint of the 1905 ed. published by J. A. Barth, Leipzig.

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Principien der Naturfilosofi = Lectures on natural
                   philosophy, 1903-1906 / Ludwig Boltzmann ; I.M. Fasol-
                   Boltzmann, ed. ; with an essay by S.G. Brush and G.Fasol.
     PUBLICATION: Berlin ; New York : Springer-Verlag, 1990.
           NOTES: Prefatory matter in German and English.


English translations of Boltzmann's papers:

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Theoretical physics and philosophical problems : selected
                   writings / Ludwig Boltzmann ; edited by Brian McGuinness ;
                   with a foreword by S. R. de Groot ; [translations from the
                   German by Paul Foulkes].
     PUBLICATION: Dordrecht ; Boston : Reidel Pub. Co., c1974.
     DESCRIPTION: xvi, 280 p. : ill. ; 23 cm.
          SERIES: Vienna circle collection ; v. 5

          AUTHOR: Boltzmann, Ludwig, 1844-1906.
           TITLE: Ludwig Boltzmann : his later life and philosophy, 1900-1906:
                   book one: a documentary history / edited by John Blackmore.
     PUBLICATION: Dordrecht ; Boston : Kluwer Academic Publishers, 1995-
          SERIES: Boston studies in the philosophy of science ; v. 168
  

Discussions on Boltzmann and evolution:

Yehuda Elkana, Boltzmann's Scientific Research Program and Its Alternatives.
In:        TITLE: The Interaction between science and philosophy. Edited by
                   Y. Elkana.
     PUBLICATION: Atlantic Highlands, N.J., Humanities Press [1974]
     DESCRIPTION: xvii, 481 p. 24 cm.
          SERIES: The Van Leer Jerusalem Foundation series
           NOTES: Papers from a symposium sponsored by the Israel Academy of
                   Sciences and Humanities, the Hebrew University of
                   Jerusalem, and the Van Leer Foundation for the Advancement
                   of Human Culture, held in Jerusalem in Jan. 1971 to honor
                   Professor Samuel Sambursky on his 70th birthday.

          AUTHOR: Brush, Stephen G.
           TITLE: The kind of motion we call heat : a history of the kinetic
                   theory of gases in the 19th century / Stephen G. Brush.
     PUBLICATION: Amsterdam ; New York : North-Holland Pub. Co. ;  New York :
                   Elsevier, 1976.
     DESCRIPTION: 2 v. (769, xxxix p.) ; 24 cm.
          SERIES: Studies in statistical mechanics ; v. 6

           TITLE: Ich bin, also denke ich : die evolutiondre
                   Erkenntnistheorie / Franz Kreuzer im Gesprdch mit
                   Engelbert Broda und Rupert Riedl.
     PUBLICATION: Wien : Deuticke, 1981.
     DESCRIPTION: 123 p. ; 21 cm.
           NOTES: "Aus Anlass des 75. Todestages von Ludwig Boltzmann"--Cover.

          AUTHOR: Prigogine, Ilya and Stengers, Isabelle 
           TITLE: Order out of chaos : man's new dialogue with nature / Ilya
                   Prigogine and Isabelle Stengers ; foreword by Alvin
                   Toffler.
     PUBLICATION: Toronto ; New York, N.Y. : Bantam Books, 1984.
     DESCRIPTION: xxxi, 349 p. : ill. ; 23 cm.
           NOTES: Based on the authors' La nouvelle alliance.

          AUTHOR: Prigogine, Ilya
           TITLE: From being to becoming : time and complexity in the
                   physical sciences / Ilya Prigogine.
     PUBLICATION: San Francisco : W. H. Freeman, c1980.
     DESCRIPTION: xix, 272 p. : ill. ; 24 cm.
           NOTES: Bibliography: p. [257]-262.



Iosif Vaisman
UNC-Chapel Hill

From owner-info-theory@net.bio.net Tue Feb 07 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!adam.cc.sunysb.edu!news.sprintlink.net!howland.reston.ans.net!lamarck.sura.net!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Message-ID: <D3nJzJ.1tt@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3gmgir$ncd@hamilton.maths.tcd.ie> <D3G7B0.Bp1@ncifcrf.gov> <3h0875$ek1@hamilton.maths.tcd.ie>
Distribution: bionet
Date: Tue, 7 Feb 1995 23:02:06 GMT
Lines: 107

In article <3h0875$ek1@hamilton.maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy) writes:

| toms@fcsparc6.ncifcrf.gov (Tom Schneider) writes:
| 
| >| but it seems to me perfectly sensible to compare the Chaitin entropy
| >| (or informational content) of DNA -- regarded simply as a string --
| >| in higher and lower organisms, and argue that the difference 
| >| does reflect a difference in the "complexity" of the organisms.
| 
| >That line of thought leads to confusion.  How about trying this.  Below are the
| >DNA sequences to which the lambda cro protein binds:
| 
| >   1 tgcgtcctgctgatgtgctcagtatcaccgccagtggtatttatgtcaacaccgccagaga
| ...
| >  12 cctccttagtacatgcaaccattatcaccgccagaggtaaaatagtcaacacgcacggtgt
| ...
| >1.  Approximately how many bits are used to describe the sequences above for
| >purposes of data transmission or storage?
| 
| I haven't the slightest idea.
| It is of the nature of algorithmic entropy that one cannot compute H(s)
| (in general) -- one can only give upper bounds to it.

So we are speaking of different things.  How about trying the simple Shannon
H measure and seeing where it leads you?  You can get real numbers out of
the data I posted.

The simple answer to the first question is that there are (roughly!) 2 bits per
base and there are 12 lines of 61 bases.  So this gives 2*12*61=1464 bits.

| >2.  What does the H function look for each column?  That is, if l is the
| >position along the site, from -30 to 30, and b is a base, then you can
| >calculate the frequency of each base at each position, f(b,l).  Ignoring small
| >sample problems, what does the curve H(l) = - sum_b f(b,l) log2 f(b,l) look
| >like?  In particular, what are the values at positions +7 and +15?
| 
| This is a misunderstanding of the definition of H(s).
| It is not defined by frequency.
| (Certainly the frequencies you quote will give an upper bound to H(s),
| but that is all.)

H(s) is not H(l).  H(l) is the Shannon uncertainty measure at position l in the
set of sites.  Shannon uncertainty was defined on probabilities though, not
frequencies.  If we substitute the frequencies we measure from the sequences,
then a small sample correction is required.  In this case the correction is
small enough to set aside for now.  (See the appendix of Schneider1986 for how
to do the correction.)

| >3.  How can you account for the observation that H is larger on the outside of
| >the site if H is information?  The protein does not contact the DNA over a region
| >much more than -10 to +10.
| 
| I don't know what "H is larger on the outside" means.
| H(s) is defined for the whole string s.

H(l) is defined for each column of the data.  Try the calculation and see what
happens.  Can you interpret the curve?  What does it mean in terms of biology?

| >4.  What is the amount of information that cro sees in this sequence set?
| 
| It is perfectly possible that H(s) does not coincide
| with your idea of "informational content".

Nope.

| Just as "potential energy" may not coincide with one's idea of "energy".
| It doesn't really matter -- use another word if you prefer.
| 
| My question is simply whether H(s) where s is the DNA string
| could make a sensible definition of the "complexity" of a creature.

I believe that that is approaching the problem in a way that is fruitless.
Lots has been written about it and no measurements and no conceptual advances
have been made!  We already know how much DNA there is, and CoT curves of the
"complexity" were done 30 years ago.  It doesn't teach us anything new.  (If
you don't believe that, then tell me something new!)

| >If you work through this example, I am sure it will illuminate your
| >understanding of H (uncertainty) and information!
| 
| I'm just using H(s) to mean 
| the algorithmic (Chaitin) entropy of the string s:
| the length of the shortest program p which will output s
| when fed into a chosen universal Turing machine U:
| 
| H(s) = min {|p|: U(p) = s}
| 
| I haven't said anything about uncertainty or information
| (except to mention that "informational content" 
| is an alternative name used by Chaitin for the measure H(s) -- 
| ignore this if you feel it causes confusion).
| If you think use of the letter 'H' causes confusion, use another letter.

H is not a good choice for algorithmic complexity.

Give it a try.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Tue Feb 07 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!panix!ddsw1!godot.cc.duq.edu!hudson.lm.com!news.pop.psu.edu!news.cac.psu.edu!howland.reston.ans.net!news.sprintlink.net!EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 8 Feb 1995 03:31:55 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 83
Distribution: bionet
Message-ID: <3h9dvb$t7j@hamilton.maths.tcd.ie>
References: <3gmgir$ncd@hamilton.maths.tcd.ie> <D3G7B0.Bp1@ncifcrf.gov> <3h0875$ek1@hamilton.maths.tcd.ie> <D3nJzJ.1tt@ncifcrf.gov>
NNTP-Posting-Host: hamilton.maths.tcd.ie

toms@fcsparc6.ncifcrf.gov (Tom Schneider) writes:

>| >| but it seems to me perfectly sensible to compare the Chaitin entropy
>| >| (or informational content) of DNA -- regarded simply as a string --
>| >| in higher and lower organisms, and argue that the difference 
>| >| does reflect a difference in the "complexity" of the organisms.

>So we are speaking of different things.  How about trying the simple Shannon
>H measure and seeing where it leads you?  You can get real numbers out of
>the data I posted.

But I think I made it perfectly clear that I was speaking of Chaitin entropy.
My question concerned Chaitin entropy,
so there is little point in applying Shannon's more complicated quantity.

>| This is a misunderstanding of the definition of H(s).
>| It is not defined by frequency.
>| (Certainly the frequencies you quote will give an upper bound to H(s),
>| but that is all.)

>H(s) is not H(l).  H(l) is the Shannon uncertainty measure at position l in the
>set of sites.  Shannon uncertainty was defined on probabilities though, not
>frequencies. 

But again, I made it perfectly clear that I was speaking of Chaitin entropy.

>| My question is simply whether H(s) where s is the DNA string
>| could make a sensible definition of the "complexity" of a creature.

>I believe that that is approaching the problem in a way that is fruitless.
>Lots has been written about it and no measurements and no conceptual advances
>have been made!

Could you give me some reference to that, please.
You seem to have 2 criticisms 
(it would surely be better to consider them separately):

(i) It is hard to measure H(s)

(ii) H(s) is not a good measure of whatever one is trying to measure.

>We already know how much DNA there is, and CoT curves of the
>"complexity" were done 30 years ago.  It doesn't teach us anything new.  (If
>you don't believe that, then tell me something new!)

I'm perfectly prepared to look at it, if you give me a reference.

>H is not a good choice for algorithmic complexity.

But I stated perfectly clearly that I was speaking of Chaitin entropy,
and it seems quite reasonable therefore to use Chaitin's notation.
The capital letters are in the public domain.
[You might as well say that Shannon should not have used H
because Boltzmann had done so before him.]

>Give it [ie Shannon entropy] a try.

But you seem to me to have argued that the Shannon entropy of DNA
is _not_ a good measure of the complexity of an organism.

I'm not convinced -- although I could be -- that Chaitin-Kolmogorov entropy
is not in fact a good measure for this purpose.
Certainly if it turned out that H(tomato DNA) > H(human DNA)
one would have some doubts, but this actually seems pretty improbable to me.
Any development is more or less certain to increase H(s)
unless some part of the DNA is deleted.
Eg if a portion of DNA is duplicated that would have hardly any effect;
but if now one of the portions was altered in some way then H(s) would increase.

I get the impression that you have some emotional attachment
to Shannon's statistical definition
over Chaitin's computer-theoretic definition.
I believe that Chaitin's definition in fact includes Shannon's
(reflecting the fact that a Turing machine defines a probability distribution);
however this is probably a side-issue.



-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Tue Feb 07 22:00:00 1995
Path: biosci!agate!usenet.ins.cwru.edu!lerc.nasa.gov!purdue!haven.umd.edu!hecate.umd.edu!ram
From: ram@mbisgi.umd.edu (Ram Samudrala)
Newsgroups: bionet.info-theory
Subject: Enthalpy and Entropy
Date: 8 Feb 1995 08:57:05 GMT
Organization: The Centre for Advanced Research in Biotechnology
Lines: 54
Message-ID: <3ha111$8qp@hecate.umd.edu>
Reply-To: ram@elan1.carb.nist.gov
NNTP-Posting-Host: indigo3.carb.nist.gov
X-Newsreader: TIN [version 1.2 PL0]

Tom Schneider and I have been having a discussion about the nature of
enthalpy lately.  It started because I recalled him once telling me
that the enthalpy of the system could be looked upon as the entropy of
the system.  This view, which seems weird to some people, is actually
propounded in the book Physical Chemistry by Peter Atkins.  Here he
defines

delta S' = - delta H / T'    --- (1)

Where the ' symbol is used to indicate surroundings.  He gets this
from assuming dS' is directly proportional to the heat, dq', that is
transferred from the system to the surroundings.  Taking temperature
dependence into account, he gets

delta S' = q' / T'

If a reaction has enthalpy change detal H, then the heat that enters
the surroundings at constant pressure is q' = - delta H, and thus (1)
follows.

Tom also writes:

So $\Delta G_{system}$ represents the {\em total\/} entropy
dissipation during a reaction \cite{Darnell1986}.  Atkins has written
a clear discussion that shows that the usefulness of energy comes only
from its entropy-increasing dissipation \cite{Atkins1984}.  (The use
of the term ``enthalpy'' hides this fact.)

The references are:

@book{Darnell1986,
author = "J. Darnell
 and H. Lodish
 and D. Baltimore",
title = "Molecular Cell Biology",
publisher = "Scientific American Books, Inc.",
address = "N. Y.",
year = "1986"}

@book{Atkins1984,
author = "P. W. Atkins",
title = "The Second Law",
publisher = "W. H. Freeman and Co.",
address = "N. Y.",
year = "1984"}

So, do you think this is a valid way to think about it?

--Ram

Check out the TWISTED HELICES band page at http://www.wam.umd.edu/~ram/th
Ram Samudrala                                     ram@elan1.carb.nist.gov
     Your shadow, the white one, who you cannot accept and who will never
                                             forget you --- Rolf Jacobson

From owner-info-theory@net.bio.net Wed Feb 08 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!panix!zip.eecs.umich.edu!caen!uwm.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!news.cac.psu.edu!news.pop.psu.edu!hudson.lm.com!godot.cc.duq.edu!newsfeed.pitt.edu!uunet!zib-berlin.de!fu-berlin.de!fub!unlisys!usenet
From: zabin@berlin.snafu.de (Hal Zabin)
Newsgroups: bionet.info-theory
Subject: Spelling checkers
Date: Thu, 09 Feb 95 07:41:51 MET
Organization: Unlimited Surprise Systems, Berlin
Lines: 3
Message-ID: <3hcdkk$e0a@unlisys.IN-Berlin.DE>
NNTP-Posting-Host: zabin.berlin.snafu.de
Mime-Version: 1.0
X-Newsreader: WinVN 0.93.0

Does anyone know if there are any spelling checkers specifically for the 
biological / medical sciences? Source would be appreciated. Hal


From owner-info-theory@net.bio.net Thu Feb 09 22:00:00 1995
Path: biosci!daresbury!bioftp.unibas.ch!citi2.fr!jussieu.fr!news.oleane.net!oleane!pipex!swrinde!howland.reston.ans.net!agate!dog.ee.lbl.gov!news.cs.utah.edu!news.cc.utah.edu!news
From: marc.fuller@gatormail.cvrti.utah.edu (Marc Fuller)
Newsgroups: bionet.info-theory
Subject: Re: How does this group get this stuff?
Date: 10 Feb 1995 16:04:07 GMT
Organization: CVRTI, University of Utah
Lines: 15
Message-ID: <3hg2pn$fc5@news.cc.utah.edu>
References: <D3rGuK.9L2@world.std.com>
NNTP-Posting-Host: fuller.cvrti.utah.edu
X-Newsreader: WinVN 0.92.6+

In article <D3rGuK.9L2@world.std.com>, rie@world.std.com (Daniel Rie) says:
>
>The posts in this group are all too often unrelated to info-theory.
>How come the title bionet.info-theory attracts such random posts?
>Is this an indication that info-theory has not acheived adaquate
>recognition as a defined disipline?


I think it's simply because a lot of people don't have
a clue what information theory means. I've seen posts
where people equate it library science. Also there are 
people that post to anything that has a remote chance 
of getting a response. For instance, the recent ad on 
credit repair that showed up in almost every group.
                           Marc Fuller

From owner-info-theory@net.bio.net Thu Feb 09 22:00:00 1995
Path: biosci!daresbury!hgmp.mrc.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!nntp-serv.cam.ac.uk!sre
From: sre@al.cam.ac.uk (Eddy Sean)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 10 Feb 95 08:38:24
Organization: Laboratory of Molecular Biology, MRC, Cambridge UK
Lines: 37
Distribution: bionet
Message-ID: <SRE.95Feb10083824@al.cam.ac.uk>
References: <3gmgir$ncd@hamilton.maths.tcd.ie> <D3G7B0.Bp1@ncifcrf.gov>
	<3h0875$ek1@hamilton.maths.tcd.ie> <D3nJzJ.1tt@ncifcrf.gov>
Reply-To: sre@mrc-lmb.cam.ac.uk
NNTP-Posting-Host: al.mrc-lmb.cam.ac.uk
In-reply-to: toms@fcsparc6.ncifcrf.gov's message of Tue, 7 Feb 1995 23:02:06 GMT


[Re: the discussion between Timothy Murphy and Tom Schneider. Murphy
asks if Chaitin/Kolmogorov algorithmic entropy might be a good measure
of complexity in a genome. Schneider doesn't see any utility in this
argument, and argues for sticking with Shannon entropy.  Everything I
know about information theory I learned by osmosis from Schneider, but
I'll take a deep breath and take Murphy's side here. Possibly you guys
are already way ahead of me, but I think there's a basic unstated issue 
that you're talking around.]

Take the DNA string "AGAGAGAGAGAGAG".  This is a pretty common sort of
thing in a metazoan genome: reiterated simple sequence repeat,
sometimes for hundreds or thousands of bases.

Schneider would argue, a la Shannon, that there's roughly 2
bits/position in this sequence. Seems that Shannon relative entropy is
always going to be a safe upper bound. But I could restate that
sequence as (AG)^7, and therefore communicate it in fewer bits than
Shannon predicts.  This is Murphy's argument, I believe -- an
algorithmic entropy approach.  It has been clear for years that number
of DNA bases doesn't correlate linearly with "information" in any
reasonable biological sense, because of massive, apparently useless
repetition in many genomes, including our own. A genome complexity
measure has to deal with the repetitive nature of metazoan DNA. A
simple Shannon entropy, as I understand it, will not.  It seems
entirely reasonable to consider algorithmic entropy measures --
though, myself, I don't know how to begin calculating an algorithmic
entropy for DNA sequence.  (Anyone?)

I do agree with Tom that this is purely academic, though. A perfect
entropy measure of genome complexity isn't going to tell us anything
we didn't already know from CoT curves.

--
- Sean Eddy
- MRC Laboratory of Molecular Biology, Cambridge, England
- sre@mrc-lmb.cam.ac.uk

From owner-info-theory@net.bio.net Thu Feb 09 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!rutgers!gatech!howland.reston.ans.net!news.cac.psu.edu!news.pop.psu.edu!hudson.lm.com!godot.cc.duq.edu!newsfeed.pitt.edu!uunet!in1.uu.net!world!rie
From: rie@world.std.com (Daniel Rie)
Subject: How does this group get this stuff?
Message-ID: <D3rGuK.9L2@world.std.com>
Summary: bionet.info-theory entropy too high
Keywords: Huh?, 
Organization: The World Public Access UNIX, Brookline, MA
Date: Fri, 10 Feb 1995 01:44:43 GMT
Lines: 10

The posts in this group are all too often unrelated to info-theory.
How come the title bionet.info-theory attracts such random posts?
Is this an indication that info-theory has not acheived adaquate
recognition as a defined disipline?

-- 

============================================================================
Dan Rie                             
Scituate, MA

From owner-info-theory@net.bio.net Thu Feb 09 22:00:00 1995
Path: biosci!NBRF.GEORGETOWN.EDU!GARAVELLI
From: GARAVELLI@NBRF.GEORGETOWN.EDU
Newsgroups: bionet.info-theory
Subject: Re: How does this group get this stuff?
Date: 10 Feb 1995 10:05:32 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 70
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <01HMVUEFXH1U9YCFVN@NBRF.Georgetown.Edu>

In message <D3rGuK.9L2@world.std.com> Daniel Rie (rie@world.std.com) asked:

> The posts in this group are all too often unrelated to info-theory.
> How come the title bionet.info-theory attracts such random posts?
> Is this an indication that info-theory has not acheived adaquate
> recognition as a defined disipline?

The announced policy of the Biological Information Theory and Chowder Society
is as follows (apologies for this partial reposting, but some repitition is
warranted in this case)
***********************************************************
* What Can I Do About Inappropriate Postings?

The short form of this news group's name, bio-info, can be a little confusing
to some people inexperienced in network communications or with little knowledge
of the discipline (if there is any :-) of biological information theory.  It
can and has been mistaken as a news group for general biological information.
Our readers should be aware that when such postings come to our attention, we
do attempt to inform, privately, the people who make these inappropriate
postings of the error of their ways and suggest alternative or more appropriate
venues.

Subjecting the writers of inappropriate posting to public excoriation is not a
good policy because the mistake is usually inadvertent and the follow-up
postings add further to the irritation of our regular readers.  When others
publicly reply to such posts in this news group, although they may think they
are being polite to the original poster, they are still annoying our regular
readers.  We suggest that a better policy for readers who do wish to reply to
inappropriate posts is to do so privately or to an appropriate news group.
***********************************************************
(from version = 1.67 of bionet.info-theory.faq  1995 January 18)

Of the two most recent inappropriate posting, one was a spam (and I received
positive confirmation that the perpetrator's access to Internet has already
been terminated) and the other appeared, at first, to be an honest mistake
of the sort mentioned above.  Tom Schneider or I always attempt to contact
these users privately, point out how the posting was inappropriate, suggest
an appropriate posting location, and then inquire where they had found the
posting address so that correcting notices can be placed at the source and
thus inhibit more inapproriate posts.  In most cases we receive polite
apologies, with admissions that the name "bio-info" had mislead them in
the short listing of such sources as "The Internet Yellow Pages".

It is unfortunate that such mistakes occur.  We are doing our best to minimize
them.  We do hope our regular users will follow our suggestion to refrain from
posting follow-up's to inappropriate notices.  If you wish to reply, please do
so privately.  If you wish to complain, please do so privately to the 
postmaster of the source node used by the poster, to the board administrator
Dave Kristofferson (biosci-help@net.bio.net), to Tom Schneider
(toms@ncifcrf.gov), or to me (Garavelli@NBRF.Georgetown.EDU).

Sometimes, as in this other recent case, the person turns out to be immature
with limited intelligence or experience of courteous social interactions.  In
such cases no amount of polite correction is going to be helpful.  It is best
to ignore the childish out-bursts and hope they will soon find another board
where they can get the attention they are desperately craving.

As to whether "info-theory has not acheived adaquate recognition as a defined
disipline", one can always hope the world will eventually come to the
enlightened realization shared by our regular readers of the sublime
importance of information theory in biology.  Of course, its always the
unenlighteded ones who wind up on our grant and publication review committees.
------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 GARAVELLI@GUNBRF.BITNET
                                 GARAVELLI@NBRF.GEORGETOWN.EDU

From owner-info-theory@net.bio.net Fri Feb 10 22:00:00 1995
Path: biosci!rutgers!gatech!howland.reston.ans.net!news.sprintlink.net!news.clark.net!rahul.net!a2i!olivea!charnel.ecst.csuchico.edu!csusac!csus.edu!csulb.edu!library.ucla.edu!news.ucdavis.edu!jimrice.ucdavis.edu!user
From: btluttbeg@ucdavis.edu (Barney Luttbeg)
Newsgroups: bionet.info-theory
Subject: Re: How does this group get this stuff?
Followup-To: bionet.info-theory
Date: 10 Feb 1995 23:39:30 GMT
Organization: University of California, Davis
Lines: 14
Distribution: world
Message-ID: <btluttbeg-100295165716@jimrice.ucdavis.edu>
References: <D3rGuK.9L2@world.std.com>
NNTP-Posting-Host: jimrice.ucdavis.edu

In article <D3rGuK.9L2@world.std.com>, rie@world.std.com (Daniel Rie)
wrote:
> 
> The posts in this group are all too often unrelated to info-theory.
> How come the title bionet.info-theory attracts such random posts?
> Is this an indication that info-theory has not acheived adaquate
> recognition as a defined disipline?
> 

It is equivalent to the population biology group getting constant questions
about human population growth and ensuing the consequences. 

Barney Luttbeg
UC Davis  

From owner-info-theory@net.bio.net Fri Feb 10 22:00:00 1995
Path: biosci!rutgers!gatech!howland.reston.ans.net!pipex!uunet!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: A bibliography of definitions of complexity
Date: 11 Feb 1995 03:48:18 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 64
Distribution: world
Message-ID: <3hhc22$h30@news.u.washington.edu>
NNTP-Posting-Host: ionesco.math.washington.edu
Keywords: biological complexity, algorithmic complexity, DNA, physical entropy

See the URL

http://alphard.cpm.aca.mmu.ac.uk/combib/combib.html

for a very extensive bibliography of papers discussing various measures
of complexity.  This is an interactive, searchable index.  I only just
discovered it, but my first impression is very positive.  See particularly the
keyword index!  I think many of the papers listed here will be of interest to
those trying to quantify biological complexity in various situations.

Many of the papers are quite old (before 1985), but this database seems
like a good way to put together of reading list of what people have
had to say in the past about the relationship between order, complexity,
information, and physical entropy.

Can anyone comment on the following papers/books?

Hinegardner,R; Engelberg,J; 1983,
Biological Complexity,
Journal of Theoretical Biology, 104, 7-20

Kampis,G; Csanyi,V; 1987,
Notes on Order and Complexity,
Journal of Theoretical Biology, 124, 111-121

Serra,R; 1988,
Some Remarks on Different Measures of Complexity
  for the Design of Self-organising Systems, in 
Cybernetics and Systems '88, Eds.Trappl,R;
Klumer Academic, Dordrecht, pages 141-148

Huberman,BA; Hogg,T; 1986,
Complexity and Adaption,
Physica D, 22, 376-384

Shea,MC; 1991,
Complexity and Evolution: what everybody knows,
Biology and Philosophy, 6(3), 303-324

Saunders,PT; Ho,MW; 1981,
On the Increase in Complexity in Evolution,
Journal of Theoretical Biology, 90, 515-530

Ebeling,W; Jimenez-Montano,MA; 1980,
On Grammars, Complexity and Information Measures of Biologoical
   Macromolecules,
Mathematical Bioscience, 52, 53-71

Gusev,VD; Kulichkov,VA; Chupkhina,OM; 1991,
Genome Complexity Analysis 1: Complexity Measures and the
   Classification of Structural Features,
Molecular Biology, 25, 669-677

Kampis,G; 1991,
Self-Modifying Systems in Biology and Cognitive Science: A New Framework for Dynamics,
   Information and Complexity,
Pergamon Press, Oxford




Chris Hillman (who has no time for reading, unfortunately!)



From owner-info-theory@net.bio.net Sat Feb 11 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!newshost.lanl.gov!news.ttu.edu!aurora.LaTech.edu!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: How does this group get this stuff?
Message-ID: <D3vC9w.D0E@ncifcrf.gov>
Keywords: Huh?, 
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <D3rGuK.9L2@world.std.com>
Date: Sun, 12 Feb 1995 03:56:19 GMT
Lines: 47

In article <D3rGuK.9L2@world.std.com> rie@world.std.com (Daniel Rie) writes:

| The posts in this group are all too often unrelated to info-theory.
| How come the title bionet.info-theory attracts such random posts?
| Is this an indication that info-theory has not [achieved] [adequate]
| recognition as a defined [discipline]?

Steve Gravelli and I have been dealing with this on the sidelines for a while
now.  The problem seems to be people new to the net who don't lurk for a while
to find out what the group is about.  Unfortunately the name fools some people
into thinking it's a general information group.  Somehow they manage to miss
the word "theory".  To these I send email and generally they are apologetic.
Steve sends them a formal statement about our groups (and carbon copies to
me).  This way we keep irrelevant discussions to a minimum.

The other class of people are those who knowingly broadcast all over the net.
I am generally not very nice to these junkmailers.

If things get severe at some point, then we could become a moderated group.
The price would be someone's time (mine???) and a delay in postings.  I think
we still have a low enough noise level that it's not a big problem.  Just dump
the messages.  If you want to help, email to the offender and don't post on the
net.  For really bad offenders, you can also email to their postmaster (eg, for
my address that would be postmaster@ncifcrf.gov) suggesting that they boot the
person off the net.  (But don't do that to me please!  ;-)  This is rather
effective.

Steve's message seems to be sufficiently strong that the recent posting on
credit was apparently withdrawn before I could see it.  So we are at least
keeping up with the tide.

As for your last question, information theory is a well defined field.
Communications engineers have all heard about Shannon, and most electrical
engineers have.  Surprisingly, only a fraction of computer scientists I've
spoken to have.  Almost no molecular biologists know about what Shannon did.
This is a reflection of the extreme divisions that we have in our sciences.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Sat Feb 11 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!adam.cc.sunysb.edu!news.sprintlink.net!howland.reston.ans.net!torn!nott!cunews!arena!etabello
From: etabello@arena.carleton.ca (Elias Tabello)
Subject: Re: How does this group get this stuff?
Message-ID: <D3u9np.Dt6@cunews.carleton.ca>
Sender: news@cunews.carleton.ca (News Administrator)
Organization: Carleton University
X-Newsreader: TIN [version 1.2 PL2]
References: <D3rGuK.9L2@world.std.com> <btluttbeg-100295165716@jimrice.ucdavis.edu>
Date: Sat, 11 Feb 1995 14:02:13 GMT
Lines: 13

Why not stop crossposting with other groups.  If you arrive at this
group, say via a search, and it has an item of interest, then of
course there will be replies, even if the item has no relevance to the group.
Why not publish a FAQ and a short introductory note as many other
newsgroups have done?

\/\/\/\/\/Elias Tabello\/\/\/\/etabello@chat.carleton.ca\/\/\/\/\/\/\/\
Only as mind-over-matterist,
as philosopher, scientist,
and informed technician
impersonally and universally preoccupied
is man infallible.
-From No More Secondhand God by RBF

From owner-info-theory@net.bio.net Sun Feb 12 22:00:00 1995
Path: biosci!agate!newsxfer.itd.umich.edu!gatech!swrinde!cs.utexas.edu!convex!convex!news.oryx.com!xdegrm
From: xdegrm@oryx.com (glenn r morton)
Newsgroups: bionet.info-theory
Subject: Yockey Definitions
Date: 13 Feb 1995 14:24:43 GMT
Organization: Oryx Energy Company
Lines: 16
Distribution: world
Message-ID: <3hnq3b$fiu@lm1.oryx.com>
NNTP-Posting-Host: na1.oryx.com

I and another fellow have just finished reading Yockey's excellent 	
Information Theory and Molecular Biology.  But we disagree on a definition	
and would like some help from an expert.  	

He interprets Yockey's term 'highly organized' as a descriptive term
for the high probability set.  I have disagreed saying that high probability	
is the set of permutations which are more likely to be formed via a random
search.  Yockey discusses the probability of two dice rolls on p. 76.  The
outcome of the total from a pair of dice is not equally probable.  Rolling a 
two is much less likely than rolling a seven.  Under my interpretation of
high probability set, seven would be in the high probability set and two would
not be.
     My friend would counter that Yockey talks about all DNA, mRNA and protein sequences as being in the high probability set also page 76.  I must admit that this hurts my case because I can understand proteins being in the high probability set since there are multiple codons for a single amino acid and so not all amino acids are equally probable.  But how all DNA can be in the high probability set confuses me.
  I am afraid that without some expert help from "ON HIGH", neither of us will be able to convince the other.  Both of us will be persuaded here by a modified argument from authority - some one telling us. :-)

glenn

From owner-info-theory@net.bio.net Sun Feb 12 22:00:00 1995
Path: biosci!newshost.lanl.gov!news.ttu.edu!simd.ttu.edu!dasheiff
From: dasheiff@simd.ttu.edu (Richard Dasheiff)
Newsgroups: bionet.info-theory,bionet.neuroscience,bionet.software,bionet.software
Subject: conference announcement
Date: 13 Feb 1995 02:27:52 GMT
Organization: Texas Tech Academic Computing Services
Lines: 64
Distribution: world
Message-ID: <3hmg38$f7j@hydra.acs.ttu.edu>
NNTP-Posting-Host: simd.ttu.edu
Xref: biosci bionet.info-theory:3104 bionet.neuroscience:6328 bionet.software:11043


Announcement and Call for Papers
___________________________________________________________________
CCCC BBB  M   M SSSS "" 9999 5555  ! THE 8TH IEEE SYMPOSIUM ON
C  C B B  MM MM SS   "" 9  9 5     ! COMPUTER-BASED MEDICAL SYSTEMS
C    BBB  M M M  SS     9999 5555  !-------------------------------
C  C B  B M   M   SS       9    5  ! Sponsored by IEEE, SPIE and in
CCCC BBBB M   M SSSS    9999 5555  ! cooperation with Texas Tech U.

	      Lubbock Plaza Hotel & Convention Center
		Lubbock, Texas, June 09-11, 1995
                --------------------------------

Organized by IEEE Computer Society, IEEE Engineering in Medicine and
Biology Society, IEEE Computational Medicine Society, SPIE (The 
International Society for Optical Engineering) and in cooperation 
with Texas Tech University & Texas Tech University Health Sciences 
Center this forum will be focused on recent cutting-edge developments 
in engineering and sciences and their applications in medical field. 
The objective of the conference is to bring together researchers, 
educators, and engineers from various disciplines including 
traditional and emerging areas of Information and Expert Systems and 
Signal Analysis for enhancement of existing medical technologies and 
reduction of health-care cost.  Authors are invited to contribute 
their work for presentation at the symposium in the form of one-page 
abstracts typed single-space before January 31, 1995. The topics of 
interest are broadly categorized into:

	Device Reliability and Safety;
	Signal-Image Analysis Algorithms, Software, and Hardware;
	Information Systems;
	Neural Networks and Expert Systems;
	Prosthetic Devices;
	Cardiovascular Technologies;
	Clinical Assessment and Risk Evaluation.

The abstracts will be processed as they come in, and decisions on 
selection will be promptly communicated to the authors not later 
than March 1, 1995. All abstracts and inquiries must be sent to the 
address below:
Dr. Sunanda Mitra, General Chair,
Dept. of Electrical Engineering, MS. 3102
Texas Tech University,
Lubbock, TX 79409-3102

Abstracts may be submitted by FAX by dialing to USA: (806)-742-1245.

P.S. Should you desire to e-mail your abstracts, you may send it
directly to me:
*******************************************************************
            TEXAS TECH UNIVERSITY HEALTH SCIENCES CENTER       
         ----------------------------------------------------------
   _     Arthur Petrosian,Ph.D.| Phone:(806)743-2495, (806)797-4307     
 _| ~-.  Assistant Professor   | Fax:  (806)743-1668, (806)743-1419
 \,* _}   3601 4th Street      | Page: (806)767-6927 
   \(     Lubbock, Texas 79430 | E-mail: neuaap@ttuhsc.edu        
*******************************************************************
posted by:

-- 
 ~.~    -_-    +_+    ^=^    `=`     :)    >:->    >>:-(    :-)    {:-)    ;)

              (.        dasheiff@simd.ttu.edu      (.
              (.^)                                 (. )

From owner-info-theory@net.bio.net Sun Feb 12 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!rutgers!uwm.edu!spool.mu.edu!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Message-ID: <D3yFrE.82I@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3h0875$ek1@hamilton.maths.tcd.ie> <D3nJzJ.1tt@ncifcrf.gov> <3h9dvb$t7j@hamilton.maths.tcd.ie>
Distribution: bionet
Date: Mon, 13 Feb 1995 20:04:25 GMT
Lines: 118

In article <3h9dvb$t7j@hamilton.maths.tcd.ie> tim@maths.tcd.ie (Timothy Murphy)
writes:

| But I think I made it perfectly clear that I was speaking of Chaitin entropy.

Ok.

| My question concerned Chaitin entropy,
| so there is little point in applying Shannon's more complicated quantity.

Shannon's measure is pretty simple to implement, and the equation is simple.
Measuring the Chaitin entropy - is there a simple algorithm?  Are there
programs available?  Not that I've heard of.  Chris?

| >I believe that that is approaching the problem in a way that is fruitless.
| >Lots has been written about it and no measurements and no conceptual advances
| >have been made!
| 
| Could you give me some reference to that, please.

I have no references to the absence of measurements.  That may simply reflect
my missing them or it may mean they don't exist.  In molecular biology there
are no papers that give the Chaitin entropy of DNA sequences, unless they are
recent or obscure.  If someone knows of them, please post!

| You seem to have 2 criticisms
| (it would surely be better to consider them separately):
| 
| (i) It is hard to measure H(s)

Apparently it is.

| (ii) H(s) is not a good measure of whatever one is trying to measure.

That's too general.  My question would be more specific:  what is knowing this
measure going to tell you?  Or turn it around and do the measure on a bunch of
DNA sequences and see if it tells you something.  It might be useful.

| >We already know how much DNA there is, and CoT curves of the
| >"complexity" were done 30 years ago.  It doesn't teach us anything new.  (If
| >you don't believe that, then tell me something new!)
| 
| I'm perfectly prepared to look at it, if you give me a reference.

@article{Britten1968,
author = "R. J. Britten
 and D. E. Kohne",
title = "Repeated Sequences in {DNA}",
journal = "Science",
volume = "161",
pages = "529-540",
year = "1968"}

The CoT number is probably proportional to the information content of the
genome.

| >Give it [ie Shannon entropy] a try.
| 
| But you seem to me to have argued that the Shannon entropy of DNA
| is _not_ a good measure of the complexity of an organism.

To avoid confusion, I call "Shannon entropy" the uncertainty H = - sum Pi log2
Pi (units: bits per state or symbol).  I was pointing out that this corresponds
to standard entropy S = - k sum Pi ln Pi (units:  joules per degree kelvin) if
the probabilities Pi refer to the same states.  So higher entropy cannot
correspond to more complexity.  To avoid this confusion - which is rampant in
the literature - I always suggest measuring information as a decrease in
uncertainty.  As for "complexity", I don't know a precise definition of that
and avoid it along with "specificity" and "consensus sequence".

| I'm not convinced -- although I could be -- that Chaitin-Kolmogorov entropy
| is not in fact a good measure for this purpose.

It might be a good measure.  But one has to go and try it.  I suspect that
nobody could get numbers for it.  If they could, then surely they could do it
for sequences that already exist in GenBank.

| Certainly if it turned out that H(tomato DNA) > H(human DNA)
| one would have some doubts, but this actually seems pretty improbable to me.

It is already known that the complexity (in the Britten and Kohne sense, which
is really just an information measure!) of some organisms is higher than
humans.  It's just another example to deflate our enlarged egos about our place
in the universe.  ;-)

| Any development is more or less certain to increase H(s)
| unless some part of the DNA is deleted.
| Eg if a portion of DNA is duplicated that would have hardly any effect;
| but if now one of the portions was altered in some way then H(s) would increase.

Sounds reasonable, but can you program it?

| I get the impression that you have some emotional attachment
| to Shannon's statistical definition
| over Chaitin's computer-theoretic definition.
| I believe that Chaitin's definition in fact includes Shannon's
| (reflecting the fact that a Turing machine defines a probability distribution);
| however this is probably a side-issue.

Sure, I'm attached to it because I've worked with the Shannon measures on real
biological systems and found them to give rather spectacular results at times.
I am not aware that anything has been done with the Chaitin measure.  There is
nothing stopping people from doing it if that measure can actually be made!
This news group is for the discussion of information theory approaches to
biology, but any rigorous mathematical method might be useful and people could
discuss it here.  Just because Shannon's measure works in some cases doesn't
mean we should stop the search for methods.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Mon Feb 13 22:00:00 1995
Path: biosci!bcm!pendragon!fullfeed!NewsMaster
From: Periannan Senapathy <sena@genome.com>
Newsgroups: bionet.info-theory
Subject: Genome Research Refutes Evolution
Date: 14 Feb 1995 20:30:38 GMT
Organization: FullFeed Communications (Internet +1.608.246.2701 info)
Lines: 63
Message-ID: <3hr3te$5ab@fullfeed.msn.fullfeed.com>
NNTP-Posting-Host: genome.msn.fullfeed.com

Hello:

     As a molecular biologist and genome researcher,
I have enjoyed following the many ongoing debates in this
and other forums over evolution theory--both as a whole,
and various aspects thereof.  My own work in genome mechanics
and genetic molecular structures has yielded much evidence
pertaining to these debates, and over the years I have
published several of my findings in PNAS, J Molec Biol,
J Biol Chem, Nucleic Acids Research, Science and other journals.
     
    Until recently I have published these findings separately,
although clearly they are all related.  Now, however, I am
publishing a single unified theory that incorporates all
of these pieces--and an enormous body of other evidence as well.
This new unified theory proposes a radically alternative
explanation for the origin and diversity of life on Earth,
asserting that most of Earth's organisms must have originated
independently in one primordial pond, and that the
natural-selection mechanism described by evolution theories
could have produced only minor variations among essentially
similar species.  These conclusions surely will provoke a
lively debate in the scientific community, but a fair reading
of the theory will show that it easily explains all of the
available evidence--molecular, biochemical, organismal
and fossil--and notably accommodates all of the contra-evolution
evidence that has dogged evolutionists since Darwin.

    To stimulate and inform the debate, I have prepared a
Web Page for the World-Wide Web that provides much more
information about the theory, including the complete text
of the Preface and first chapter of my new book,
"Independent Birth of Organisms"--a comprehensive articulation
of the entire theory.  If you are interested in more information
you are  invited to browse these materials at

            http://www.fullfeed.com/genome/

These materials, and the book itself, are engagingly written
to be accessible to educated lay readers, but the book is also
fully annotated with my research and other technical data for
scientific corroboration.

    If you do not have access to the World-Wide Web, you may
obtain more information via gopher at:

        gopher.fullfeed.com

in the Genome International directory or by sending email to:

        listserv@fullfeed.com

leaving the subject line blank and typing

        get genome catalog

in the body of the message. You will receive a catalog of 
available files and instructions.


-- Periannan Senapathy, Ph.D.



From owner-info-theory@net.bio.net Mon Feb 13 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!bcm!cs.utexas.edu!swrinde!emory!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Message-ID: <D408CM.8Hz@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3h0875$ek1@hamilton.maths.tcd.ie>> <D3nJzJ.1tt@ncifcrf.gov> <SRE.95Feb10083824@al.cam.ac.uk>
Distribution: bionet
Date: Tue, 14 Feb 1995 19:19:33 GMT
Lines: 46

In article <SRE.95Feb10083824@al.cam.ac.uk> sre@mrc-lmb.cam.ac.uk
(Sean Eddy) writes:

| [Re: the discussion between Timothy Murphy and Tom Schneider. Murphy
| asks if Chaitin/Kolmogorov algorithmic entropy might be a good measure
| of complexity in a genome. Schneider doesn't see any utility in this
| argument, and argues for sticking with Shannon entropy.  Everything I
| know about information theory I learned by osmosis from Schneider, but
| I'll take a deep breath and take Murphy's side here. Possibly you guys
| are already way ahead of me, but I think there's a basic unstated issue 
| that you're talking around.]

| Take the DNA string "AGAGAGAGAGAGAG".  This is a pretty common sort of
| thing in a metazoan genome: reiterated simple sequence repeat,
| sometimes for hundreds or thousands of bases.
| 
| Schneider would argue, a la Shannon, that there's roughly 2
| bits/position in this sequence.

It is true that I often assume independence of position, to simplify matters or
because there are not enough data to calculate the higher order models.  But if
you read Shannon's papers you will find ones about the entropy of English.
It's rather fun.  He shows that the information in strings like you gave is
low, since the bases are not independent.  He then builds up a series of models
for English.  This is a really fun thing to do: build probability tables for 1
long, 2 long, 3 long etc sets of letters.  Then synthesize strings from them.
At a certain point it sounds like English - or German or whatever language you
choose, although it is gibberish!

Algorithmic complexity goes way beyond this and says that there exists an
algorithm which can generate the sequence.  An example if the sequence of pi,
3.14159 ... which looks "random" but can be generated by a reasonably small
program.  The hard part of this idea, as I understand it, is that there is no
algorithm for getting the algorithm!  So nobody can make practical
measurements.  If you can't get numbers, what good is the method?

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Mon Feb 13 22:00:00 1995
Path: biosci!NBRF.GEORGETOWN.EDU!GARAVELLI
From: GARAVELLI@NBRF.GEORGETOWN.EDU
Newsgroups: bionet.info-theory
Subject: Re: THE BELL CURVE
Date: 14 Feb 1995 11:24:00 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 29
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <01HN1J9QBYF69YCIHG@NBRF.Georgetown.Edu>
NNTP-Posting-Host: net.bio.net

In message <60.2464.3075.0N1CFD8C@canrem.com> Mike Mehta
(mike.mehta@canrem.com) proposed some general questions on the recent, and
somewhat controversial, book "The Bell Curve".  While this book and discussions
about it may have some interest to the readers of bionet.bio-info, none of the
questions Mike proposes seems to involve biological applications of
information theory.  In this case we are not certain whether Mike has
drawn a wrong conclusion about the purpose of bionet.bio-info, as some recent
misdirected readers have, or whether he is an information theorist with an
interest in the "hot" topic of this book.  Either way, we would suggest that
any follow-up discussions on this thread, would probably be more appropriately
directed to bionet.general and not to bionet.bio-info.

The FAQ for this group is posted once or twice a month, but if anyone
has nonetheless managed to miss this highly informative and entertaining
document, copies can be obtained from either of us.
------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 GARAVELLI@GUNBRF.BITNET
                                 GARAVELLI@NBRF.GEORGETOWN.EDU

                                 Tom Schneider
                                 National Cancer Institute
                                 Laboratory of Mathematical Biology
                                 Frederick, Maryland  21702-1201
                                 toms@ncifcrf.gov

From owner-info-theory@net.bio.net Mon Feb 13 22:00:00 1995
Path: biosci!rutgers!uwm.edu!news.moneng.mei.com!howland.reston.ans.net!news.sprintlink.net!uunet!world!news.bu.edu!ppp-83-4.bu.edu!user
From: famulus@acs.bu.edu (Alex Kasman)
Newsgroups: bionet.info-theory
Subject: DNA Programming
Date: 14 Feb 1995 14:55:29 GMT
Organization: Boston University
Lines: 73
Message-ID: <famulus-1402950955540001@ppp-83-4.bu.edu>
NNTP-Posting-Host: ppp-83-4.bu.edu


Some kind reader of this newsgroup has suggested that I post my
(unpopular) opinions here to start an "interesting" discussion.  Let me
start by saying:

1) I am a mathematician by profession, but I am do know quite a bit about
biology and have some training as a biologist.

2) I think making use of DNA for programming has tremendous potential
(just look at the wonderful DNA programs that evolution has written), but
(HERE IS THE UNPOPULAR PART) ...

3) I was not pleased with the recent "hype" concerning the work of Adelman
[Science, November 1994]  which I am told has been receiving much
discussion in this group.  (I haven't been reading.)

Below is a letter which I wrote to Science (and was not published)
concerning this article.  I realize that the true significance of the
results are debatable, and I look forward to a friendly discussion about
it here.

---------------------------------------

Adleman ["Molecular Computation of Solutions to Combinatorial
Problems", Science, 11 November 1994] claims to give an example of
"carrying out computations at the molecular level".  Therein, he shows
that he can determine whether a particular directed graph has a
Hamiltonian path by observing the end products of a reaction involving
DNA.  In particular, the author recognized that the mathematical
question of the existance of a Hamiltonian path is equivalent to the
possibility of producing double stranded DNA with certain properties
from a particular mixture of oligo-nucleotides.  (Furthermore, he was
able to determine a laboratory procedure for finding DNA with these
properties).  Is it then reasonable to say that he has used the DNA as
a molecular computing device?

Mathematical objects have well defined properties, but no particular
"real" interpretation.  By finding real systems which have these
properties, one can draw conclusions about the real system from the
"mathematical model".  This is generally referred to as "applied
mathematics".  It is theoretically possible to do things in the other
direction.  That is, finding a real system and a mathematical model,
one could answer a mathematical question by examining the real
situation.  For example, the solution to a certain differential
equation can be interpreted as the temperature of a hot yam losing
heat in a room of fixed temperature ["Calculus" by Hughes Hallett, et
al., Wiley 1994, p. 508].  Consequently, one could continuously
measure the temperature of an actual yam and thereby solve one
particular differential equation.  Still, nobody would claim that this
demonstrates the feasibility of solving differential equations using
yams.

The paper by Adelman is useful and interesting, and it has brought
attention to the potential for developing molecular computing devices.
However, the particular results seem to be a better example of
"inverse" applied mathematics than of programming or the sort of
molecular Turing machine which he mentions.

---------------------

Note (added to news posting, not to letter): I realize that a calculator
also is just a system whose behavior is the same as a mathematical system
(arithmetic) and that this is why it is useful.  However, I think the
significant thing is that, since we have developed circuits that are
equivalent to the boolean operators and the functions of a Turing machine,
we can make electrical circuits do anything (within reason).  On the other
hand, I see nothing like this in Adelman's work; just the fact that DNA
anneals only for particular base pairings.

-- 
Alex Kasman
Department of Mathematics
Boston University

From owner-info-theory@net.bio.net Mon Feb 13 22:00:00 1995
Newsgroups: bionet.info-theory
Subject: THE BELL CURVE
From: mike.mehta@canrem.com (Mike Mehta)
Path: biosci!rutgers!gatech!howland.reston.ans.net!news.sprintlink.net!uunet!pipex!bnr.co.uk!bmdhh222.bnr.ca!bcarh8ac.bnr.ca!bcarh189.bnr.ca!nott!torn!uunet.ca!uunet.ca!portnoy!canrem.com!mike.mehta
Distribution: world
Message-ID: <60.2464.3075.0N1CFD8C@canrem.com>
Date: Mon, 13 Feb 95 21:00:00 -0500
Organization: CRS Online  (Toronto, Ontario)
Lines: 34


Hello,

I am reading a book called "The Bell Curve" by Richard Hernstein and
Charles Murray.  It is a book about intelligence and class structure
in American life.  Since I have started reading this book, I have
become extremely interested in intelligence and genetics.  I was
hoping to receive as much information: opinions, facts, recent
updates, regarding these topics.  There are hundreds of questions I am
dying to ask, so to start:

-  What is intelligence defined as?  Does it have multiple
definitions?

-  What methods exist of measuring intelligence?  How accurate are
they?  Can I obtain copies of these tests or test questions?

-  Is Intelligence generally considered to be due to genetics or
environment?  What evidence is there supporting either?

-  What do you people see as the consequences of the revelation that
intelligence may differ ON AVERAGE by racial group?

-  Have scientists found an 'intelligence gene'?

-  How does the brain function to form thought?  Is it just a reaction
to various chemicals being shifted around?  If that is true, can we
accurately predict every thought of the mind?

As well, information on cases regarding tests done on identical twins,
or children and parents, etc.  would be appreciated.  I am open to any
and all comments and opinions.  Please respond,

Mike

From owner-info-theory@net.bio.net Mon Feb 13 22:00:00 1995
Path: biosci!news.cs.umb.edu!hsdndev!purdue!haven.umd.edu!hecate.umd.edu!ram
From: ram@mbisgi.umd.edu (Ram Samudrala)
Newsgroups: bionet.info-theory
Subject: Re: Enthalpy and Entropy
Date: 13 Feb 1995 23:38:50 GMT
Organization: The Centre for Advanced Research in Biotechnology
Lines: 15
Message-ID: <3hoqia$hph@hecate.umd.edu>
References: <3ha111$8qp@hecate.umd.edu>
Reply-To: ram@elan1.carb.nist.gov
NNTP-Posting-Host: indigo3.carb.nist.gov
X-Newsreader: TIN [version 1.2 PL0]

Ram Samudrala (ram@mbisgi.umd.edu) wrote:

>that the enthalpy of the system could be looked upon as the entropy of
>the system.
     ^^^^^^

That should be "surroundings".  Sorry.  Or better, the enthalpy change
of the system is related to the entropy change of the surroundings.

--Ram

Check out the TWISTED HELICES band page at http://www.wam.umd.edu/~ram/th
ram@elan1.carb.nist.gov            Daddy's lil girl ain't a girl no more.  
This is outrage and it's gross.    This is getting to me and I'm drowned. 
                        I'm a negative creep and I'm stoned!   ---Nirvana

From owner-info-theory@net.bio.net Mon Feb 13 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!alfa02.medio.net!netnews.nwnet.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 14 Feb 1995 02:38:00 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 89
Distribution: bionet
Message-ID: <3hp528$9kh@news.u.washington.edu>
References: <D3yFrE.82I@ncifcrf.gov>
NNTP-Posting-Host: escher.math.washington.edu

         1         2         3         4        5         6         7        8
123456789012345678901234567890123456789012346789012345679801234567890123456790

In article <D3yFrE.82I@ncifcrf.gov>,
toms@fcsparc6.ncifcrf.gov (Tom Schneider) writes:

|> Shannon's measure is pretty simple to implement, and the equation is
|> simple.
|> Measuring the Chaitin entropy - is there a simple algorithm?  Are there
|> programs available?  Not that I've heard of.  Chris?

Chaitin's algorithmic entropy or algorithmic complexity (of a finite
sequence) is known to be uncomputable, so there exists no
GENERAL algorithm (that's a THEOREM!)  For a particular sequence,
it may be possible to find the exact value, but such a sequence must be
very short or have some special property.  I find it plausible there might
be some (undiscovered?) algorithm for computing the complexity of a special
class of sequences, but I find it very IMPLAUSIBLE that one could even HOPE
to compute the exact value for a long piece of DNA.

On the other hand, it is sometimes possible to give an upper bound for the
complexity
by writing a program and proving that this program produces the sequence in
question.  And as I explained in my long post called IDAC, there ARE theorems
giving precise, quantitative relationships between algorithmic complexity,
Shannon's source entropy, Kolmogorov's metric entropy, and topological
entropy, not to mention my algebraic entropy.  But these theorems certainly
do NOT apply to DNA!

Indeed, I don't see why Tim Murphy thinks algorithmic complexity makes sense
in the DNA context, because the precise values depend on what universal
Turing machine you are writing your programs to run on.  As I explained
in a post last year, this problem disappears in the case of complexity per
bit computed for infinite sequences as a suitable limit, but any actual
genome is finite.  Maybe he's only interested in relative complexity?
 
|> | Certainly if it turned out that H(tomato DNA) > H(human DNA)
|> | one would have some doubts, but this actually seems pretty improbable to me.
|> 
|> It is already known that the complexity (in the Britten and Kohne sense, which
|> is really just an information measure!) of some organisms is higher than
|> humans.  It's just another example to deflate our enlarged egos about our place
|> in the universe.  ;-)

Any chance of your summarizing briefly the definition of this information
measure?

Tim Murphy wrote (addressing Tom Schneider):

|> | I get the impression that you have some emotional attachment
|> | to Shannon's statistical definition
|> | over Chaitin's computer-theoretic definition.
|> | I believe that Chaitin's definition in fact includes Shannon's
|> | (reflecting the fact that a Turing machine defines a probability
|> | distribution);
|> | however this is probably a side-issue.

Thomas and Cover, Elements of Information Theory, Wiley, 1991, give a simple
asymptotic relation between Shannon's source entropy (for a Bernoulli source,
as I recall) and the algorithmic complexity of a TYPICAL sequence produced
by that source (typical = ``plausible'' in the sense of Shannon).
But it is simplistic (and incorrect) to say that Chaitin's notion of entropy
``includes'' Shannon's notion; rather, they are quantitatively related by
various asymptotic relations and upper bounds.  Chaitin's entropy could
not possibly ``include'' Shannon's entropy because these two quantities
are defined for completely different types of creatures: a finite sequence
does not canonically define a Shannonian type message source, and while
a source does (roughly) define a set of typical or plausible finite
sequences, I don't see why it would define a UNIQUE sequence.   So there is
no way of establishing a bijection between sources and a subset of the
set of all finite binary sequences, which would be neccessary if it were
true that Shannon's notion of entropy is ``included in'' (a special case
of) Chaitin's notion of entropy.

On the bright side, most of the entropies I mentioned above (not Chaitin's,
yet, although I'm pretty close) are indeed special cases of what I have
called entropy valuations on a joinset in a preprint called
``A Formal Theory of Information'', Part I of which is available from me
via email.  I go beyond Tom in my prediction that ``entropies'' which are
not closely related to Shannon's entropy, to algorithmic complexity, or
for that matter to physical entropy, but which measure different notions
of biological complexity, may come to play a serious role in biology in
the next century--- in particular, in such areas as quantifying the
evolving ``complexity'' of a developing embryo or an ecosystem.
(This doesn't contradict his assertion that Shannon's mutual information
is well suited to analyzing binding sites--- I'm just saying I doubt
it's well suited to analyzing different types of biological complexity).

Chris Hillman

From owner-info-theory@net.bio.net Tue Feb 14 22:00:00 1995
Path: biosci!rutgers!gatech!swrinde!howland.reston.ans.net!news.sprintlink.net!uunet!in1.uu.net!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: hpyockey@aol.com (HPYockey)
Newsgroups: bionet.info-theory
Subject: Re: Yockey Definitions
Date: 15 Feb 1995 12:44:36 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 62
Sender: root@newsbf02.news.aol.com
Message-ID: <3htei4$3nd@newsbf02.news.aol.com>
References: <3hnq3b$fiu@lm1.oryx.com>
Reply-To: hpyockey@aol.com (HPYockey)
NNTP-Posting-Host: newsbf02.mail.aol.com

Subject: Reply to Yockey Definitions
From: xdegrm@oryx.com (glenn r morton)
Date: 13 Feb 1995 14:24:43 GMT
Message-ID: <3hnq3b$fiu@lm1.oryx.com>
The query is:

I and another fellow have just finished reading Yockey's excellent  
Information Theory and Molecular Biology.  But we disagree on a definition

and would like some help from an expert.   


REPLY from Hubert P. Yockey
Who could help better than the author himself!
There seems to be a confusion here between the "high probability set"
discussed in connection with the Shannon-McMillan theorem and my
discussion of the meaning of the words "complexity" "order" "orderliness"
and information content. 

Let us take up the Shannon-McMillan theorem. The sequences one deals with
are almost always generated by Markov processes where the probability
distributions of the elements are not all equally probable. The elementary
calculation the number of POSSIBLE sequences in a chain of, say n
elements, all equally probable, where the number of events, 100, is n to
power 100. For example, it is sometimes said that the number of POSSIBLE
sequences of 100 amino acids is 20 to power 100. This is technically true
but unimportant. It is a practical matter that, almost always, we are not
interested in events of vanishingly small probability.

 The Shannon-McMillan theorem tells us that since amino acids are not all
equally probable we are including many sequences of probablity very near
zero. The Shannon-McMillan theorem tells us that the ensemble of such
sequences can be divided into two sets, a high probability set and a low
probability set. The number of sequences in the high probability set is 2
to power 100 H(n). Each of these sequences is approxmately of probability
1/ H(n).

We are talking about the number of sequences, not individual events. There
will be 2's, 12's, 7's and all the other numbers that appear in the toss
on the dice.

In the dice game the events are not all equally probable, for example 7 is
six times as probable as 2 or 12.  Note on page 76 that the number of
sequences in the high probability set of  100 throws of the dice is 2.69
times 10 to minus 6 of the total possible number of sequences. The reason
that protein squences are in the high probability group is simply that
amino acids are not all equally probable.

So the final result is that the number of  DNA, mRNA and protein sequences
are in the high probability set is because the probability of their
components are not all equal, as is the case in the coin toss. 

With regard to randomness, in addition to what I wrote in the book, I
suggest you read Yockey "When is Random Random? Nature 344 page 823

'Highly organized' means that the sequence must be described by a message
nearly as long as itself. Soldiers on parade are 'highly ordered'. General
Motors is 'highly organized'.

Keep asking questions. 

Regards Hubert P Yockey

From owner-info-theory@net.bio.net Tue Feb 14 22:00:00 1995
Path: biosci!uunet.uu.net!tripos!rigel!zauhar
From: tripos!rigel!zauhar@uunet.uu.net (Randy Zauhar)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 15 Feb 1995 10:17:18 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 48
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <199502151746.RAA10209@rigel.tripos>
NNTP-Posting-Host: net.bio.net


 Comparing Algorithmic complexity and Shannon entropy, Tim Murphy
  writes:

   > 
   > In fact, there is no difference between the 2 methods.
   > In order to apply Shannon entropy,
   > you first have to compute the probability distribution.
   > To do this, you consider first single characters,
   > then sequences of 2 characters, then 3 characters, and so on.
   > You can never tell when you have gone far enough.
   > All you can do is to approximate to the Shannon entropy,
   > having decided that it is sufficient say to consider 3-sequences.
   > 
   > This exacly mirrors the uncertainty in the Chaitin case.
   > You similarly decide that you will only try a given input p,
   > to see if U(p) = s,
   > for a certain number of steps.
   > 
   > I believe that these 2 uncertainties are in fact exactly the same.
   > 
   > -- 
   > Timothy Murphy  
   > e-mail: tim@maths.tcd.ie
   > tel: +353-1-2842366
   > s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland
   > 

  First, the uncertainties are numerically not equivalent - the string 
 pi looks "maximally complex" to the Shannon measure, "minimally complex"
 to the AC measure. Second, in computing the Shannon measure, you have
 an algorithm for directly estimating the measure (you can certainly
 compute a series of proabability distributions from your data!) Since
 there is no algorithm to find the shortest algorithm to generate your 
 data, how do you propose to find the AC measure in general? While Chris
 Hillman has pointed out that there are certain cases where the measures
 may be related, they are still very different objects!

     Randy

All opinions expressed here are mine, not my employer's

///////////////////////////////////////////////////////////////////////// 
\\ Randy J. Zauhar, PhD             | E-mail: zauhar@tripos.com        //
\\ Tripos, Inc.                     |       : uunet!tripos.com!zauhar  //
\\ 1699 S. Hanley Rd., Suite 303    |  Phone: (314) 647-1099 Ext. 3382 //
\\ St. Louis, MO 63144              |                                  //
/////////////////////////////////////////////////////////////////////////

From owner-info-theory@net.bio.net Tue Feb 14 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 15 Feb 1995 16:25:23 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 30
Distribution: bionet
Message-ID: <3ht9tj$c40@hamilton.maths.tcd.ie>
References: <3h0875$ek1@hamilton.maths.tcd.ie>> <D3nJzJ.1tt@ncifcrf.gov> <SRE.95Feb10083824@al.cam.ac.uk> <D408CM.8Hz@ncifcrf.gov>
NNTP-Posting-Host: hamilton.maths.tcd.ie

toms@fcsparc6.ncifcrf.gov (Tom Schneider) writes:

>Algorithmic complexity goes way beyond this and says that there exists an
>algorithm which can generate the sequence.  An example if the sequence of pi,
>3.14159 ... which looks "random" but can be generated by a reasonably small
>program.  The hard part of this idea, as I understand it, is that there is no
>algorithm for getting the algorithm!  So nobody can make practical
>measurements.  If you can't get numbers, what good is the method?

In fact, there is no difference between the 2 methods.
In order to apply Shannon entropy,
you first have to compute the probability distribution.
To do this, you consider first single characters,
then sequences of 2 characters, then 3 characters, and so on.
You can never tell when you have gone far enough.
All you can do is to approximate to the Shannon entropy,
having decided that it is sufficient say to consider 3-sequences.

This exacly mirrors the uncertainty in the Chaitin case.
You similarly decide that you will only try a given input p,
to see if U(p) = s,
for a certain number of steps.

I believe that these 2 uncertainties are in fact exactly the same.

-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Tue Feb 14 22:00:00 1995
Path: biosci!TAU.BEKKOAME.OR.JP!isshikim
From: isshikim@TAU.BEKKOAME.OR.JP (Masashi Isshiki)
Newsgroups: bionet.info-theory
Subject: unsubscribe
Date: 14 Feb 1995 16:46:13 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 9
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <199502150047.JAA11485@bekkoame.bekkoame.or.jp>
NNTP-Posting-Host: net.bio.net

unsubscribe bio-info isshikim@tau.bekkoame.or.jp
Masashi Isshiki
the 4th. Dep. of Int. Med.,Faculty of Medicine,Tokyo Univ.

E-mail: 
   isshiki-tky@umin.u-tokyo.ac.jp
   isshikim@tau.bekkoame.or.jp
   PXG00710@niftyserve.or.jp


From owner-info-theory@net.bio.net Tue Feb 14 22:00:00 1995
Path: biosci!rutgers!uwm.edu!reuter.cse.ogi.edu!netnews.nwnet.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: DNA Programming
Date: 14 Feb 1995 23:38:33 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 116
Distribution: world
Message-ID: <3hretp$5f5@news.u.washington.edu>
References: <famulus-1402950955540001@ppp-83-4.bu.edu>
NNTP-Posting-Host: ionesco.math.washington.edu

In article <famulus-1402950955540001@ppp-83-4.bu.edu>,
famulus@acs.bu.edu (Alex Kasman) writes:

|> Mathematical objects have well defined properties, but no particular
|> "real" interpretation.  By finding real systems which have these
|> properties, one can draw conclusions about the real system from the
|> "mathematical model".  This is generally referred to as "applied
|> mathematics".  It is theoretically possible to do things in the other
|> direction.  That is, finding a real system and a mathematical model,
|> one could answer a mathematical question by examining the real
|> situation.  For example, the solution to a certain differential
|> equation can be interpreted as the temperature of a hot yam losing
|> heat in a room of fixed temperature ["Calculus" by Hughes Hallett, et
|> al., Wiley 1994, p. 508].  Consequently, one could continuously
|> measure the temperature of an actual yam and thereby solve one
|> particular differential equation.  Still, nobody would claim that this
|> demonstrates the feasibility of solving differential equations using
|> yams.

Certainly you are correct in saying that we are accustomed to using
a theoretical analysis of some mathematical model to make predictions about
the behavior of a biochemical or physical "system".  I agree that Adleman's
ideas could be described as the "inverse procedure" of using the behavior of
a biochemical system to come up with a solution to a mathematical problem which
is not suceptible to mathematical analysis.  (More exactly, assuming NP \neq P,
there is no algorithm for finding even an approximate solution to a sufficiently
large Hamiltonian tour problem which will not have running time exceeding 
the age of the Universe.  Of course, you can always proceed by trial and
error, but the chances of success are neglible.)

Adleman's ideas are extremely novel, but his ``inverse approach'' is NOT
new.  In particular, before the days of the digital computer, there were things
called analog computers which mathematicians used to approximate hard definite
integrals.   For instance, I think it was once common practice to use a
mechanical device called a planimeter to compute hard integrals which are
not suceptible to theoretical analysis yielding an exact value (by Liouville's
theory of integration, without ad hoc definition of new functions, such integrals
exist.)  This procedure amounts to using the behavior of a physical system
(the planimeter, a complicated system of linkages) to produce an approximate
solution to a hard mathematical problem.  I see a close analogy with what
Adleman has done in the lab--- with this difference, that all earlier "inverse
procedures" (to my knowledge) relied upon "analog computation" to yield
an APPROXIMATE answer, whereas Adleman has taken advantage of the "digital"
nature of DNA to use a DNA soup to come up with an EXACT answer to a
hard mathematical problem.  The fact that his answer is EXACT and that it
is obtained by a DIGITAL "computation" performed by the biochemical system
(the DNA soup) is what is truly novel about his procedure, in my opinion.

Incidently, one of the columns by Douglas Hofstadter in his now defunct
column which ran in Scientific American for a few years concerned precisely
the idea of using physical systems to solve mathematical problems.
DH discussed (as I recall) a "sphagetti computer" for solving scheduling
problems (if memory serves) and also a "quantum computer" for solving
certain other kinds of hard problems.  I cannot recall the exact reference
but probably this paper is worth reading (or re-reading) in the context
of Adleman's work and what it implies for the future of computing.

|> The paper by Adelman is useful and interesting, and it has brought
|> attention to the potential for developing molecular computing devices.
|> However, the particular results seem to be a better example of
|> "inverse" applied mathematics than of programming or the sort of
|> molecular Turing machine which he mentions.

I think it is fair to say that his procedure as described in the Science paper
is better described as a "subroutine" which can be incorporated into a
program written for a conventional computer as an "oracle" which can
solve every instance of a very special problem.   The real practical objection
to the procedure as described in the Science paper is that the current
procedure might not work well for a really large number of vertices.
There has been intense discussion on this newsgroup concerning how Adleman's
original procedure can be modified to work for any number of vertices,
or even to solve completely different problems.

Adleman's paper has already generated at least two new preprints.  In one
of these, Richard Lipton, a computer scientist at Princeton, claims that
a modified procedure is capable of solving any instance of a certain problem
concerning the logic of circuits.  If this is true, then DNA computing
is capable of solving anything a Turing machine can solve.  (I confess I
don't know enough about computer science to really understand his paper;
maybe John Amenyo can explain what he is saying for the rest of us?)
Lipton's preprint may be found (if you have Mosaic or some other web-browser)
at the URL:

/ftp/pub/people/rjl/bio.ps

(This URL begins with ftp, so you can probably get it using ftp.) 
Adleman himself has also written a very thought provoking follow-up to his
paper, which I have a copy of somewhere and volunteer to send to interested
readers.

|> Note (added to news posting, not to letter): I realize that a calculator
|> also is just a system whose behavior is the same as a mathematical system
|> (arithmetic) and that this is why it is useful.  However, I think the
|> significant thing is that, since we have developed circuits that are
|> equivalent to the boolean operators and the functions of a Turing machine,
|> we can make electrical circuits do anything (within reason).  On the other
|> hand, I see nothing like this in Adelman's work; just the fact that DNA
|> anneals only for particular base pairings.
 
This is precisely why Lipton's work is potentially so important.  Since I
may well have misunderstood his paper, I look forward to comments by anyone
else who has read it, particularly those who know something about logic
gates and so forth.

About Turing machines--- I agree with you that the idea of a biological
Turing machine seems on the surface to be rather different from what LA
actually did.  If Lipton is correct, there is a new paradigm for a general
purpose computer which is not a Turing machine but an Adleman-like procedure
invovling human beings setting up and running experiments with DNA soups.
I discussed in an earlier posting on the implications of Adleman's work
the idea of thinking about the process of living (at the level of a single
cell, perhaps an E Coli) as being in some ways analogous to computing.
This analogy suggests still a third paradigm for a universal computer
might exist (my speculations were quite vague).

Chris Hillman

From owner-info-theory@net.bio.net Tue Feb 14 22:00:00 1995
Path: biosci!rutgers!uwm.edu!news.moneng.mei.com!howland.reston.ans.net!news.sprintlink.net!uunet!in1.uu.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 15 Feb 1995 20:05:24 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 128
Distribution: bionet
Message-ID: <3htmq4$ed2@news.u.washington.edu>
References: <3h0875$ek1@hamilton.maths.tcd.ie>> <D3nJzJ.1tt@ncifcrf.gov> <SRE.95Feb10083824@al.cam.ac.uk> <D408CM.8Hz@ncifcrf.gov> <3ht9tj$c40@hamilton.maths.tcd.ie>
NNTP-Posting-Host: escher.math.washington.edu

In article <3ht9tj$c40@hamilton.maths.tcd.ie>,
tim@maths.tcd.ie (Timothy Murphy) writes:


|> In order to apply Shannon entropy,
|> you first have to compute the probability distribution.
|> To do this, you consider first single characters,
|> then sequences of 2 characters, then 3 characters, and so on.
|> You can never tell when you have gone far enough.
|> All you can do is to approximate to the Shannon entropy,
|> having decided that it is sufficient say to consider 3-sequences.

Tim, that's not true.  Take a look at Peter Walters' textbook
Introduction to Ergodic Theory, Springer, 1981.  The entropy you refer
to is written there as h(S,\A) for a measure-preserving transformation
S on some probabilty measure space (X,\M,\mu).  In many important cases
the limit can be computed exactly on theoretical grounds.  In particular,
the exact values are known for whole classes of dynamical systems, such
as shifts of finite type.  This situation contrasts strongly with the
case of Chaitin's entropy, where to my knowledge (I might change my
mind if I had a chance to examine Li and Vitanyi's textbook on this
entropy) there are no infinite classes where the Chaitin entropy can
be computed exactly, just a small collection of very special cases.

Note: h(S,\A) is the entropy involved in the modern statement of the
Shannon-Macmillan-Breiman or Equipartition theorem (see Walters for
details).

If you do not know the S-invariant measure \mu, and/or do not know S,
so that you must estimate the entropy empirically from the asymptotic
behavior of what you hope is a typical sequence x, S(x), ...--- this
was the situation when Shannon tried to estimate the entropy per bit
of natural languages--- then I agree there is a slight analogy with
the fact that the exact value of Chaitin's entropy can only rarely be
found.  However, Shannon's entropy can be estimated AS CLOSELY AS DESIRED
by the limiting process you describe (provided that you assume that
your sequence is typical--- this is true with probability one but not
with absolute certainty), whereas I am sure there is NO GENERAL METHOD
for estimating Chaitin's entropy as closely as desired.   As I mentioned
in a previous post, Thomas and Cover do discuss in the book cited below
some simple upper bounds for Chaitin's entropy,
but there is no way to obtain an upper bound as close as desired to
the actual value.   For a different method of estimating source entropy
as closely as desired from a typical sequence, see the article by
Ornstein and Weiss cited below.

|> This exactly mirrors the uncertainty in the Chaitin case.
|> You similarly decide that you will only try a given input p,
|> to see if U(p) = s,
|> for a certain number of steps.

I think you have a basic misunderstanding here.  The reason that Chaitin's
entropy is uncomputable is very simple--- some programs never halt when
run on U.  By the Halting Theorem, there is no way to tell whether some
program you are testing on U will never halt or simply has a running time
greater than the lifetime of the Universe.   So your procedure cannot
work, unless you have a countable number of copies of U available to run
programs simultaneously.   Chaitin's proof using algorithmic information
theory of Godel's Theorem, and the relation to the Halting Theorem, are
discussed in a beautiful article by Martin Davis cited below.

|> I believe that these 2 uncertainties are in fact exactly the same.

That is not true.  They are distinct, although there are certain asymptotic
relations between them.   To see that they are distinct, consider the fact
that Shannon's source entropy h(S,\A) depends upon the definition of some
S-invariant ergodic measure on the space X where S lives, as well as some
finite sigma-subalgebra \A of the sigma-algebra \M on X.  Chaitin's entropy
is defined for finite binary strings.  As I said before, we know from
Shannon's original work that the sequences output by the source (X,\M,\mu,S,\A)
fall into two classes--- the typical ones, which have measure one, and the
atypical ones, which predominate in number but have measure zero.
As I discussed in a posting last year, there are theorems relating the
Shannon source entropy to the Chaitin entropy of a typical sequence output
by that source (see the article by White cited below),
but the Chaitin entropy referred to here is the limit as
n goes to infinity of the K(x_n)/n where x_n is the finite sequence consisting
of the first n symbols.   This limit might not exist!  But when it does, 
under certain situations it must agree with the source entropy.  But even
then you can find uncountably many distinct sequences whose Chaitin entropy
(in the above sense) either does not exist, or differs from the source entropy.

It is true that Chaitin showed how to define a "canonical" probability measure on
a subset of the set of finite binary sequences (namely, the prefix-free programs
for U), such that log \mu(p) is within a certain factor (depending on U) of the
Chaitin entropy of p, but K(p) \neq log \mu(p), in general!  See the book by
Cover and Thomas for details.

So these two quantities are defined in completely different contexts, and while
there are precise relations between them in some cases, these relations are 
NEVER as simple as saying ``Chaitin's entropy of a finite string may be
identified with Shannon's entropy of a stochastic message source''.
The article by Jensen cited below has a good, easy to read discussion of
the relations between various dynamical quantities such as information
dimension, metric and topological entropy, Lyapunov numbers, etc.--- in
general, these relations are quite complex and must be stated with great
care.

--------------------------------------------------------------------------------

REFERENCES:

Thomas M. Cover and Joy A. Thomas, Elements of information theory.
New York: Wiley, 1991.

Martin Davis, What is a computation?  In Mathematics today: twelve informal
essays, ed. by Lynn Arthur Steen. New York: Springer, 1978.

Lyman P. Hurd, Jarkko Kari, and Karel Culik, The topological entropy of
cellular automata is uncomputable,  Ergodic Theory and Dynamical Systems
(1992), 12, 255--265.

Jens Ledet Jensen, Chaotic dynamical systems with a view toward statistics:
a review, in Networks and chaos : statistical and probabilistic aspects,
edited by O.E. Barndorff-Nielsen, J.L. Jensen, and W.S Kendall.
New York : Chapman & Hall, 1993.

Donald Samuel Ornstein and Benjamin Weiss, Entropy and data compression schemes,
IEEE Transactions on Information Theory (1993) vol. 39 No. 1, 78--83.

Peter Walters, Introduction to ergodic theory.  New York: Springer, 1981.

Homer S. White, Algorithmic complexity of points in dynamical systems,
Ergodic Theory and Dynamical Systems (1993) 13, 807--830

---------------------------------------------------------------------------------

Chris Hillman

From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Path: biosci!uunet.uu.net!tripos!rigel!zauhar
From: tripos!rigel!zauhar@uunet.uu.net (Randy Zauhar)
Newsgroups: bionet.info-theory
Subject: Re: DNA Programming
Date: 16 Feb 1995 10:24:13 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 38
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <199502161703.RAA03426@rigel.tripos>
NNTP-Posting-Host: net.bio.net


   > 
   > What I find fascinating is that the correspondence between mathematical
   > objects and "real" objects seems infallible, at least in some cases.
   > For example, euclidian geometry has probably NEVER failed a carpenter,
   > for all practical purposes.  I doubt that mathematical concepts are
   > "purely abstract" and "just happen" to correspond well to the physical
   > world.  There has to be some connection.  Maybe Plato's forms can
   > somehow offer an answer.  Anyone care to comment?
   > 
   > 

     From a non-philsopher's perspective, I think that mathematical systems
  are "rich" enough to describe ANY regular behavior (and for that matter,
  even some of the properties of apparent "chaotic" behavior!) Mathematical
  constructs such as Euclidean geometry are, after all, idealizations of
  observations in the natural world. One is constructing a "virtual world"
  with mathematical objects that in some sense reproduce the behavior
  observed in nature. So long as the natural world follows "rules", and 
  you have captured those rules adequately in your construction, your model
  is then a realistic descriptor of observed reality. Of course, your 
  model lives in a very different realm than the real world!

     I sometimes get the sense that philospher's think that mathematics
  somehow "accidentally" reproduces nature. While I may be missing some
  deep and subtle points, I wonder if THEY really have the whole picture!

     Randy


All opinions expressed here are mine, not my employer's

///////////////////////////////////////////////////////////////////////// 
\\ Randy J. Zauhar, PhD             | E-mail: zauhar@tripos.com        //
\\ Tripos, Inc.                     |       : uunet!tripos.com!zauhar  //
\\ 1699 S. Hanley Rd., Suite 303    |  Phone: (314) 647-1099 Ext. 3382 //
\\ St. Louis, MO 63144              |                                  //
/////////////////////////////////////////////////////////////////////////

From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Path: biosci!SNFMA1.IF.USP.BR!szeinfel
From: szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S)
Newsgroups: bionet.info-theory
Subject: Re: DNA Programming
Date: 16 Feb 1995 10:23:32 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 25
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <Pine.SUN.3.90.950216161616.7354A-100000@snfma1.if.usp.br>
References: <3hvais$h9h@elaine33.Stanford.EDU>
NNTP-Posting-Host: net.bio.net

On 16 Feb 1995, Helen Claudia Gremillion wrote:

> What I find fascinating is that the correspondence between mathematical
> objects and "real" objects seems infallible, at least in some cases.
> For example, euclidian geometry has probably NEVER failed a carpenter,
> for all practical purposes.  I doubt that mathematical concepts are
> "purely abstract" and "just happen" to correspond well to the physical
> world.  There has to be some connection.  Maybe Plato's forms can
> somehow offer an answer.  Anyone care to comment?

	While discussing this point with a friend we concluded that math 
not need to agree with reality (whatever this means). The point is that 
the way of reasoning our brain uses to translate the inputs from external 
world (photons, mechanical stimulus)into "thoughts" may be similar to the 
ones used (by our brain) to construct mathematics. 
	Hope this help,
				Rafael.
*---------------------------------------------------------------------*
* Rafael Iosef Najmanovich Szeinfeld | Depto. de Fisica Geral         *
* Statistical Mechanics Group        | Instituto de Fisica            *
* General Physics Deptartament       | Universidade Sao Paulo         *
* Physics Institute                  | Rua do Matao s/n CEP 01452-990 *     
* University of Sao Paulo - Brazil   | Caixa Postal 20516             * 
* E-mail: szeinfel@snfma1.if.usp.br  | Sao Paulo - Brasil             *
*---------------------------------------------------------------------*

From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Path: biosci!rutgers!uwm.edu!news.moneng.mei.com!howland.reston.ans.net!news.sprintlink.net!uunet!in1.uu.net!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: hpyockey@aol.com (HPYockey)
Newsgroups: bionet.info-theory
Subject: Re: Yockey Definitions
Date: 16 Feb 1995 12:10:10 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 78
Sender: root@newsbf02.news.aol.com
Message-ID: <3i00ti$it0@newsbf02.news.aol.com>
References: <3hnq3b$fiu@lm1.oryx.com>
Reply-To: hpyockey@aol.com (HPYockey)
NNTP-Posting-Host: newsbf02.mail.aol.com

Subj:  Re: Yockey Definitions
Date:  Wed, Feb 15, 1995 2:11 PM EST
From:  Glenn.Morton@oryx.com
To: hpyockey@aol.com, GRMorton@aol.com

If I can bother you for one more question, I will have all the current
issues wrapped up in a pretty little bow.  Your example of the amino
acids being of unequal probability is quite clear to me due to the
direction
of information flow.  But what confused me most was the statement that DNA
sequences are all in the high probability set.  How this can be is not
clear to me.  We observe DNA sequences with  w,x,y,z, probabilities for
the
four nucleotides.  In that sense, once can calculate a high and low
probability set,but since I can't see any fundamental physical reason why
G
should be preferred to C,A or T, other than that it just happened like
that,
I fail to see that calling DNA sequences 'high probability' means
anything.

I hope I am making sense.  I am a geophysicist who works in signal
analysis
so I am a little bit away from my field but not on the other side of the
world.

Thank you so much for your reply.  Your book was the most fascinating
reading
I have done for quite a while.Unfortunately, there were two proofs in
your book I was unable, (or too stupid) to follow and the SHannon-McMillan
theorem was one of them.  I thought I had gotten the jist of it and you
confirmed that.

glenn

To:Glenn 
Let us go to Section 2.3.2 Is the number of sequences the same as the
total number possible? 
Equation 2.45 shows that the probability of the sequences concerned is 2
to -NH. Accordingly the number of these sequences is 2 to+NH. This number
is often MUCH smaller than the n to Nth power that we learned in school?
Where did all these sequences go?

The Shannon-McMillan theorem gives us the answer. Those sequences included
in those of probability 2 to -NH are the high probability group or set we
have been discussing. This applies to any set of sequences where the
members of the alphabet are not all of equal probability. Protein
sequences and the sequences generated by a Markov process by tossing two
dice are in that class. Tossing of a fair coin is not because all events
or tosses are of the same probability, namely, 1/2.

What the Shannon-McMillan Theorem tells us is that we may ignore the total
number of those sequences not given by 2 to NH. The TOTAL probabililty of
all such sequences is less than a small number eta. 

See page 76 where I apply the Shannon-McMillan Theorem to the dice game.
One of the POSSIBLE sequences in the group of negligible probability would
be composed only of snake eyes and box cars. Is you see a sequence of 100
snake eyes and box cars you better examine the dice. So now you see that 
 G is not 
 preferred to C,A or T. It is the sequences of G,A, C or T that are in a
high probability set or group, in contrast to those sequences of low
probability. All possible DNA sequences are NOT in the high probability
set or group.

Now that you know about and understand the Shannon-McMIllan Theorem you
are better informed that virtually all molecular biologists, at least on
this topic. 

I appreciate your flattering comments that are high praise for a book full
of mathematics. Such books are supposed to be dull and dry! Be sure to
tell your friends.
 
What did you think of my comments on the origin of life and the
self-organization of Manfred Eigen? That is a lot more controversial than
the Shannon-McMillan Theorem.

Best Regards Hubert 

From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Path: biosci!daresbury!bioftp.unibas.ch!citi2.fr!jussieu.fr!oleane!pipex!swrinde!cs.utexas.edu!news.sprintlink.net!uunet!caen!uwm.edu!fnnews.fnal.gov!unixhub!news.Stanford.EDU!not-for-mail
From: pies@leland.Stanford.EDU (Helen Claudia Gremillion)
Newsgroups: bionet.info-theory
Subject: Re: DNA Programming
Date: 16 Feb 1995 02:49:00 -0800
Organization: Stanford University
Lines: 18
Message-ID: <3hvais$h9h@elaine33.Stanford.EDU>
References: <famulus-1402950955540001@ppp-83-4.bu.edu>
NNTP-Posting-Host: elaine33.stanford.edu

In article <famulus-1402950955540001@ppp-83-4.bu.edu>,
Alex Kasman <famulus@acs.bu.edu> wrote:
>
>Mathematical objects have well defined properties, but no particular
>"real" interpretation.  By finding real systems which have these
>properties, one can draw conclusions about the real system from the
>"mathematical model".  This is generally referred to as "applied
>mathematics".


What I find fascinating is that the correspondence between mathematical
objects and "real" objects seems infallible, at least in some cases.
For example, euclidian geometry has probably NEVER failed a carpenter,
for all practical purposes.  I doubt that mathematical concepts are
"purely abstract" and "just happen" to correspond well to the physical
world.  There has to be some connection.  Maybe Plato's forms can
somehow offer an answer.  Anyone care to comment?


From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Path: biosci!rutgers!uwm.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!Germany.EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: 16 Feb 1995 18:48:53 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 106
Distribution: bionet
Message-ID: <3i06ml$1pv@hamilton.maths.tcd.ie>
References: <3h0875$ek1@hamilton.maths.tcd.ie>> <D3nJzJ.1tt@ncifcrf.gov> <SRE.95Feb10083824@al.cam.ac.uk> <D408CM.8Hz@ncifcrf.gov> <3ht9tj$c40@hamilton.maths.tcd.ie> <3htmq4$ed2@news.u.washington.edu>
NNTP-Posting-Host: hamilton.maths.tcd.ie

hillman@math.washington.edu (Christopher Hillman) writes:

>|> In order to apply Shannon entropy,
>|> you first have to compute the probability distribution.
>|> To do this, you consider first single characters,
>|> then sequences of 2 characters, then 3 characters, and so on.
>|> You can never tell when you have gone far enough.
>|> All you can do is to approximate to the Shannon entropy,
>|> having decided that it is sufficient say to consider 3-sequences.

>Tim, that's not true.  Take a look at Peter Walters' textbook
>Introduction to Ergodic Theory, Springer, 1981.  The entropy you refer
>to is written there as h(S,\A) for a measure-preserving transformation
>S on some probabilty measure space (X,\M,\mu).  

My point is that in practice you don't have a measure space --
you have to compute the probabilities from the data.
In my opinion you actually choose the probabilities
to maximise the entropy, as far as you are able.

So I think the difference between us is this:
you are starting from a probability space,
but I am starting from raw data.
I believe the latter is more realistic in the case of DNA.

>However, Shannon's entropy can be estimated AS CLOSELY AS DESIRED
>by the limiting process you describe (provided that you assume that
>your sequence is typical--- this is true with probability one but not
>with absolute certainty), whereas I am sure there is NO GENERAL METHOD
>for estimating Chaitin's entropy as closely as desired.   

I believe the situations are exactly the same.
Suppose you have a program p of length 2 outputting s: U(p) = s.
Then you know that H(s) <= 2.
It may be that one of the (two) programs of length 1, say "0", never completes.
So you will never know if this program might output s,
giving H(s) = 1.
However, you try U("0") for 1,000,000 steps and it hasn't completed.
You're pretty sure it isn't going to complete and output s.

It's just the same with strings in Shannon's Theory.
You may have tested sequences of length up to 1,000,000
and found all frequencies equal.
But when you try 1,000,001 sequences you find
the same pattern occurs over and over again,
thus reducing the entropy dramatically.

>As I mentioned
>in a previous post, Thomas and Cover do discuss in the book cited below
>some simple upper bounds for Chaitin's entropy,
>but there is no way to obtain an upper bound as close as desired to
>the actual value.   

This isn't really relevant,
but I read Thomas & Cover after you recommended it,
and wasn't too happy at the treatment of Chaitin entropy.
Basically, they take Shannon entropy as the basic definition,
and try to see how Chaitin's approach fits in.

>I think you have a basic misunderstanding here.  The reason that Chaitin's
>entropy is uncomputable is very simple--- some programs never halt when
>run on U.  By the Halting Theorem, there is no way to tell whether some
>program you are testing on U will never halt or simply has a running time
>greater than the lifetime of the Universe.   

There is no way to tell, agreed.
But you will get more and more certain as you try for longer and longer.
When the program has run for 10^9 steps
you are willing to bet quite a lot it will never halt.
And even more that it will not halt with a given output.
[You see the miles and miles of tape filling your office,
and say to yourself, "There is no way that is all going to erase itself."]

>Chaitin's proof using algorithmic information
>theory of Godel's Theorem, and the relation to the Halting Theorem, are
>discussed in a beautiful article by Martin Davis cited below.

I'll certainly look at Martin Davis' article.
I think his book on Computability is one of the best textbooks I know.
I confess I didn't understand Chaitin's proof of Godel's Theorem.
He claimed somewhere that it follows from the fact that
the information in a theorem cannot exceed the information in the axioms
from which one starts,
but when one looked at the details this didn't in fact seem to be his argument.

>|> I believe that these 2 uncertainties are in fact exactly the same.

>That is not true.  

I confess that I have been infected with a slight doubt about this,
by the example of pi.
In fact nothing is known, I believe, about the frequency of digits
or digit sequences in pi.
(There may be no 0's after the first 10^9 digits.)
However, it seems to me that it _could_ be known
that all sequences of length n occur with equal frequency.

I don't think this invalidates my general argument above,
that Chaitin entropy is not in fact any harder to compute than Shannon entropy,
if one doesn't suppose at the outset that the probabilities are known.

-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Path: biosci!rutgers!uwm.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!swrinde!elroy.jpl.nasa.gov!netline-fddi.jpl.nasa.gov!nntp-server.caltech.edu!news
From: Bradley Minch <bminch@pcmp.caltech.edu>
Newsgroups: bionet.info-theory
Subject: Wanted: Books
Date: 16 Feb 1995 18:19:53 GMT
Organization: California Institute of Technology, Pasadena
Lines: 31
Message-ID: <3i0509$d7o@gap.cco.caltech.edu>
NNTP-Posting-Host: pool.pcmp.caltech.edu

Hello Everyone,

	I am very much interested in purchasing either or both of
the following books:

@book{
author = "Donald M. MacKay and Michael E. Fisher",
title = "Analogue Computing at Ultra-High Speed: An Experimental and
Theoretical Study",
publisher = "John Wiley & Sons, Inc.",
place = "New York",
year = "1962"
}

@book{
author = "Donald M. MacKay",
title = "Information, Mechanism, and Meaning",
publisher = "The M.I.T. Press",
place = "Cambridge, MA",
year = "1969",
isbn = "262 13055 6"
}

If anyone out there has either of these titles (in reasonably good
shape) and is willing to part with them or has a line on where they
may be obtained, please let me know.  It would be appreciated very
much.  Please reply by e-mail only as I don't often read usenet news.

						Thanks much,
						Brad Minch


From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.moneng.mei.com!uwm.edu!msunews!harbinger.cc.monash.edu.au!bruce.cs.monash.edu.au!lloyd
From: lloyd@cs.monash.edu.au (Lloyd Allison)
Subject: multiple alignment
Message-ID: <lloyd.792905826@bruce.cs.monash.edu.au>
Summary: Gibbs sampling multiple alignments
Keywords: information, alignment, minimum message/description length
Sender: news@bruce.cs.monash.edu.au (USENET News System)
Organization: Computer Science, Monash University, Australia
Date: Thu, 16 Feb 1995 03:37:06 GMT
Lines: 21

Self-promotion:

L.Allison & C.S.Wallace.
The posterior probability distribution of alignments and its application to
parameter estimation of evolutionary trees and to optimization of multiple
alignments.
J. Mol. Evol. 39 418-430 1994.

A few copies left; anyone want one?

We use Gibbs sampling to estimate the edge lengths of an evolutionary tree.
The sampling (feasible) approximates the average over all (infeasible!)
alignments.  More information at:
<A HREF="http://www.cs.monash.edu.au/~lloyd/tildeStrings">Molecular Biology</A>
and:
<A HREF="http://www.cs.monash.edu.au/~lloyd/tildeMML">MML</A>

Lloyd ALLISON
Dept. of Computer Science, Monash University, Clayton, Victoria 3168, AUSTRALIA
tel: 61 3 905 5205       fax: 61 3 905 5146       email: lloyd@cs.monash.edu.au


From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!alfa02.medio.net!netnews.nwnet.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: DNA Programming
Date: 16 Feb 1995 05:30:59 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 44
Distribution: world
Message-ID: <3hunuj$kbb@news.u.washington.edu>
References: <famulus-1402950955540001@ppp-83-4.bu.edu> <D42sEn.AFD@ncifcrf.gov>
NNTP-Posting-Host: ionesco.math.washington.edu

In article <D42sEn.AFD@ncifcrf.gov>,
toms@fcsparc6.ncifcrf.gov (Tom Schneider) writes:

|> Later people did use electrical circuits with
|> capacitors, resistors etc to solve differential equations.  Some of the nicest
|> analog computers are those that find the minimum surface spanning a region -
|> which is easily done by dipping a wire in soapy water!

This comment reminded me of something I once read in Feynman's Lectures
on Physics (if memory serves).  The headlights for a certain early and now classic
car were designed like this: the differential equation for the motion
of an electron in an electric field due to some funny shaped charged plate
(or maybe it was a magnetic field and a funny shaped magnet) is the same
as the equation for the motion of a small ball rolling over a surface
with a funny shaped mountain ridge.  So the designers took a sheet of rubber
and pressed it up here with a wooden form and pressed it down there with some
pegs, and then they rolled marbles around and adjusted the heights of the
various pieces holding the rubber in place until they got the desired motion.
A beautiful example of using one physical system to model another!

But notice that here too only approximate answers were obtained (only
approximate answers were needed), in contrast to the ``digital computing''
performed by Adleman's method.  (When I say ``digital computing'', I am
aware that his procedure only solves a very specific problem, and for a
limited range of inputs at that.  Nevertheless, I think his procedure
deserves to be called digital computing, because an answer to a problem
was found by ``digital technology''.)

|> Part of the trouble is that using DNA alone, one has a hard
|> time making nearly irreversible dissipations and so fixing the result is hard.
|> Irreversible dissipations require more molecular machinery - like proteins that
|> burn ATP.  That would be harder to build.

This is a good point.  What do you think of the idea of trying to build a
``biomolecular Turing machine'' consisting of an enzyme complex (=read/write
head) working its way along a DNA chain (= tape)?  I would guess that
there is no theoretical reason to rule such a device out, since this sounds
so much like things that occur in bacteria every day, but that the technology
to design such fine tuned enzyme complexes is decades in the future--- unless
someone in a lab has a lucky break and finds a natural complex which ``almost
works''; then it would probably be quickly modified to do just what we want.

Chris Hillman  


From owner-info-theory@net.bio.net Wed Feb 15 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: DNA Programming
Message-ID: <D42sEn.AFD@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <famulus-1402950955540001@ppp-83-4.bu.edu>
Date: Thu, 16 Feb 1995 04:27:58 GMT
Lines: 35

In article <famulus-1402950955540001@ppp-83-4.bu.edu> famulus@acs.bu.edu (Alex
Kasman) writes:

| Still, nobody would claim that this demonstrates the feasibility of solving
| differential equations using yams.

This reminds me of the methods that people used a long time ago to do
computations.  They would have mechanical devices that would have the
properties one would want and get the calculations by these analog computers.
They were things like a rotating disk.  Above the disk would be an arm with a
wheel at the end.  The calculation was something like an integration, but I
don't know further details.  Later people did use electrical circuits with
capacitors, resistors etc to solve differential equations.  Some of the nicest
analog computers are those that find the minimum surface spanning a region -
which is easily done by dipping a wire in soapy water!

| On the other hand, I see nothing like this in Adelman's work; just the fact
| that DNA anneals only for particular base pairings.

Here I agree.  The method is slow to program and there is no easy way to make
Boolean circuits.  Part of the trouble is that using DNA alone, one has a hard
time making nearly irreversible dissipations and so fixing the result is hard.
Irreversible dissipations require more molecular machinery - like proteins that
burn ATP.  That would be harder to build.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Thu Feb 16 22:00:00 1995
Path: biosci!rutgers!uwm.edu!reuter.cse.ogi.edu!netnews.nwnet.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Algorithmic Entropies versus Probabilistic Entropies
Date: 16 Feb 1995 23:42:22 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 221
Message-ID: <3i0nsu$ro3@news.u.washington.edu>
References: <3i06ml$1pv@hamilton.maths.tcd.ie>
NNTP-Posting-Host: magritte.math.washington.edu
Keywords: Chaitin, Shannon

In article <3i06ml$1pv@hamilton.maths.tcd.ie>,
tim@maths.tcd.ie (Timothy Murphy) writes:

|> hillman@math.washington.edu (Christopher Hillman) writes:
|> 
|> >|> In order to apply Shannon entropy,
|> >|> you first have to compute the probability distribution.
|> >|> To do this, you consider first single characters,
|> >|> then sequences of 2 characters, then 3 characters, and so on.
|> >|> You can never tell when you have gone far enough.
|> >|> All you can do is to approximate to the Shannon entropy,
|> >|> having decided that it is sufficient say to consider 3-sequences.
|> 
|> >Tim, that's not true.  Take a look at Peter Walters' textbook
|> >Introduction to Ergodic Theory, Springer, 1981.  The entropy you refer
|> >to is written there as h(S,\A) for a measure-preserving transformation
|> >S on some probabilty measure space (X,\M,\mu).  
|> 
|> My point is that in practice you don't have a measure space --
|> you have to compute the probabilities from the data.
|> In my opinion you actually choose the probabilities
|> to maximise the entropy, as far as you are able.
|> 
|> So I think the difference between us is this:
|> you are starting from a probability space,
|> but I am starting from raw data.
|> I believe the latter is more realistic in the case of DNA.

OK, I may have misunderstood what you had in mind here.  That's ironic,
because in connecting my algebraic entropy to source entropy I also take
a ``phenomonological'' approach and appeal to the fact that the empirical
frequencies converge to the true n-gram probabilities for all n and for
any given TYPICAL sequence to estimate analogues of what Shannon called
the entropy of the n-th order approximation to the source.  Then let
n tend to infty.... But as you know non-typical sequences can have
the ``wrong'' Chaitin entropy (and/or the ``wrong'' Galois entropy.

But my objection does not concern whether it is reasonable to assume that
the sequence you are looking at is ``typical''.  It is not even
that many exact results are known for Shannon's entropy and associated
quantities, but very few are known for Chaitin's entropy.

My objection is this: whereas the Birkhoff ergodic theorem (a theoretical
result with PROFOUND practical consequences) assures us that we can estimate
the probabilities AS CLOSELY AS REQUIRED by looking at the ``head'' of what
we assume is a typical sequence produced by some ergodic source, there is
no such error control in the case of Chaitin's entropy.   There is no
way to be sure that you know the Chaitin entropy of a sequence to within
epsilon. because this depends so delicately on whether a given program really
halts or not, and because halting is so unpredictable.
 
|> >Shannon's entropy can be estimated AS CLOSELY AS DESIRED
|> >by the limiting process you describe (provided that you assume that
|> >your sequence is typical--- this is true with probability one but not
|> >with absolute certainty), whereas I am sure there is NO GENERAL METHOD
|> >for estimating Chaitin's entropy as closely as desired.   
|> 
|> I believe the situations are exactly the same.
|> Suppose you have a program p of length 2 outputting s: U(p) = s.
|> Then you know that H(s) <= 2.
|> It may be that one of the (two) programs of length 1, say "0", never completes.
|> So you will never know if this program might output s,
|> giving H(s) = 1.
|> However, you try U("0") for 1,000,000 steps and it hasn't completed.
|> You're pretty sure it isn't going to complete and output s.
|> 
|> It's just the same with strings in Shannon's Theory.
|> You may have tested sequences of length up to 1,000,000
|> and found all frequencies equal.
|> But when you try 1,000,001 sequences you find
|> the same pattern occurs over and over again,
|> thus reducing the entropy dramatically.

You mean it is possible to be fooled by a sequence which begins in an ``atypical''
manner and then ``settles down'' to more typical behavior?  Certainly, but the
point is by sampling enough successive digits (of a single sequence, rather than
sampling many sequences) you can always make the probability of a given error
as low as desired.  Nothing like this is true for Chaitin's entropy--- indeed,
as you search through more and more programs, guessing that a given program
which seems to be taking a very long time to finish will in fact never halt gets
more and more risky--- the computation times of programs which DO halt vary
considerably even for small programs.   So even taking the empirical approach,
the Baysian analysis approach of estimating source entropy seems to me quite
different from the approach you advocate for ``guessing'' (not estimating, since
you have no control of error) the Chaitin entropy.

So I claim that the two situations are completely different, from a practical
point of view as much as a theoretical point of view!
 
|> >As I mentioned
|> >in a previous post, Thomas and Cover do discuss in the book cited below
|> >some simple upper bounds for Chaitin's entropy,
|> >but there is no way to obtain an upper bound as close as desired to
|> >the actual value.   
|> 
|> This isn't really relevant,

I think the question of whether or not you can control the error is ESSENTIAL,
as is usually true in doing hard analysis or statistical estimation.

|> but I read Thomas & Cover after you recommended it,
|> and wasn't too happy at the treatment of Chaitin entropy.
|> Basically, they take Shannon entropy as the basic definition,
|> and try to see how Chaitin's approach fits in.

I think that is the most reasonable approach, given the theoretical and
practical problems with Chaitin's entropy.  Algorithmic information theory
is fascinating and has some very thought provoking results (such as
the Philosopher's Stone Theorem, a generalized Godel Theorem), but it remains
much less important for mathematics and applications overall than classical
information theory.  Eventually, algorithmic information theory may assume
more theoretical importance, but by its very nature seems unlikely to ever
be of much use for applications.
 
|> >I think you have a basic misunderstanding here.  The reason that Chaitin's
|> >entropy is uncomputable is very simple--- some programs never halt when
|> >run on U.  By the Halting Theorem, there is no way to tell whether some
|> >program you are testing on U will never halt or simply has a running time
|> >greater than the lifetime of the Universe.   
|> 
|> There is no way to tell, agreed.
|> But you will get more and more certain as you try for longer and longer.
|> When the program has run for 10^9 steps
|> you are willing to bet quite a lot it will never halt.
|> And even more that it will not halt with a given output.
|> [You see the miles and miles of tape filling your office,
|> and say to yourself, "There is no way that is all going to erase itself."]

That might seem intuitive, but I bet that if you formalize this and make
a very careful analysis, you will find that this intuition is completely
wrong.   Maybe an expert in algorithmic information theory can be found
to comment on this?  We are actually getting into some very sophisticated
questions here, it seems to me, and the only way to settle our disagreement
with certainty may be to do some hard math!

You might look again at Thomas & Cover to read about Kelley's
interpretation of information as the expected loss in gambling for money,
if you are interested in trying to formalize your argument (which I am
sure you would find breaks down).

|> >Chaitin's proof using algorithmic information
|> >theory of Godel's Theorem, and the relation to the Halting Theorem, are
|> >discussed in a beautiful article by Martin Davis cited below.
|> 
|> I'll certainly look at Martin Davis' article.
|> I think his book on Computability is one of the best textbooks I know.
|> I confess I didn't understand Chaitin's proof of Godel's Theorem.
|> He claimed somewhere that it follows from the fact that
|> the information in a theorem cannot exceed the information in the axioms
|> from which one starts,
|> but when one looked at the details this didn't in fact seem to be his argument.

I also find Chaitin's papers very hard to follow--- they seem to be written
in a private language, which is no doubt a consequence of his unusual biography.
I found Davis' account in the article quite readable.   I just remembered that there
is another VERY recent book which gives a clear account of Chaitin's proof, but
I presently seem to be unable to locate it in our computerized catalog.  It has
a funny title which I cannot remember, unfortunately.
 
|> >|> I believe that these 2 uncertainties are in fact exactly the same.
|> 
|> >That is not true.  
|> 
|> I confess that I have been infected with a slight doubt about this,
|> by the example of pi.
|> In fact nothing is known, I believe, about the frequency of digits
|> or digit sequences in pi.
|> (There may be no 0's after the first 10^9 digits.)

As far as I know, this is correct.  To my knowledge, pi has never been
proven to be ``normal'' in the sense of Borel.

|> However, it seems to me that it _could_ be known
|> that all sequences of length n occur with equal frequency.

For all n?  For all bases (base ten, base two, etc) ?  This is what it means
for a number to be normal in the sense of Borel.  Also, there ARE certain
sequences (e.g. Champernowne's sequence) which are KNOWN to be normal.
Champernowne's sequence provides an example of a number with zero
algorithmic entropy and source entropy log 2 (if we assume the
sequence is typical for some source, we can easily see that source
must be a Bernoulli source.)   Champernowne's sequence can be defined
as follows:

  0,1,01,11,000,001,010,011,100,101,110,111,0000,...

This obviously has algorithmic entropy zero (in the sense of taking
the limit K(x_n)/n, where K(x_n) is the Chaitin entropy of the first n
bits), and can be proven to be normal (I have references somewhere to
the original paper by Champernowne--- the proof is nontrivial, but
the result is very plausible.)

|> I don't think this invalidates my general argument above,
|> that Chaitin entropy is not in fact any harder to compute than Shannon entropy,
|> if one doesn't suppose at the outset that the probabilities are known.

Take a look at the paper by Ornstein and Weiss.  I am sure that the situation
with estimating Shannon's entropies and its derivative entropies is much, much
better than for ``guessing'' Chaitin's entropy, because it is known 
(if I understand the situation correctly) that there is no
``increasingly reliable'' method of guessing Chaitin's entropy for
an arbitrary sequence.   In short, I stand by my claim that although there
ARE easy and general UPPER bounds for Chaitin's entropy, there is no way of
estimating the entropy of a given sequence from a finite set of its terms
in the sense of being able to come arbitrarily close the actual value by
computing more and more terms.

You overlooked one of my most important objections: the Chaitin entropy of
a FINITE string depends on the universal Turing machine you use to run
your programs.   Which UTM do you propose to use for DNA and why
not some other?   If you want to use the entropy per bit of an infinite sequence,
as in my IDAC post, you have to worry about whether \lim_{n -> infty} K(x_n)/n
converges at all (it might not).  It is true that theorems discussed in the
paper by White imply that for a shift of finite type (equipped with its
maximal entropy measure as described in IDAC) a typical sequence must have
algorithmic entropy equal to the source entropy, so for very special systems
you CAN approximate the algorithmic entropy of typical sequences.  This
doesn't contradict what I said above because the typical sequences have
special properties.

Chris Hillman

From owner-info-theory@net.bio.net Thu Feb 16 22:00:00 1995
Path: biosci!SNFMA1.IF.USP.BR!szeinfel
From: szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S)
Newsgroups: bionet.info-theory
Subject: physical meaning of information.
Date: 17 Feb 1995 07:01:39 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 17
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br>
References: <3i24uq$53f@hamilton.maths.tcd.ie>
NNTP-Posting-Host: net.bio.net

	Hi All,
	There is one point that does not permits me fully accept information 
theory. The point is if information has any physical background or it's 
just a manipulation of ideas not directly related to nature. In the later 
case, why use different names to the same quantity (entropy) ?
	Hope my precupation is not to naive.

				Rafael.

*---------------------------------------------------------------------*
* Rafael Iosef Najmanovich Szeinfeld | Depto. de Fisica Geral         *
* Statistical Mechanics Group        | Instituto de Fisica            *
* General Physics Deptartament       | Universidade Sao Paulo         *
* Physics Institute                  | Rua do Matao s/n CEP 01452-990 *     
* University of Sao Paulo - Brazil   | Caixa Postal 20516             * 
* E-mail: szeinfel@snfma1.if.usp.br  | Sao Paulo - Brasil             *
*---------------------------------------------------------------------*

From owner-info-theory@net.bio.net Thu Feb 16 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!Germany.EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Algorithmic Entropies versus Probabilistic Entropies
Date: 17 Feb 1995 12:31:22 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 53
Message-ID: <3i24uq$53f@hamilton.maths.tcd.ie>
References: <3i06ml$1pv@hamilton.maths.tcd.ie> <3i0nsu$ro3@news.u.washington.edu>
NNTP-Posting-Host: hamilton.maths.tcd.ie
Keywords: Chaitin, Shannon

hillman@math.washington.edu (Christopher Hillman) writes:

>|> >As I mentioned
>|> >in a previous post, Thomas and Cover do discuss in the book cited below
>|> >some simple upper bounds for Chaitin's entropy,
>|> >but there is no way to obtain an upper bound as close as desired to
>|> >the actual value.   
>|> 
>|> This isn't really relevant,

>I think the question of whether or not you can control the error is ESSENTIAL,
>as is usually true in doing hard analysis or statistical estimation.

>|> but I read Thomas & Cover after you recommended it,
>|> and wasn't too happy at the treatment of Chaitin entropy.

Just to be clear.
I meant that _my_ following remark was probably not relevant,
not _your_ reference to Thomas & Cover !

>You overlooked one of my most important objections: the Chaitin entropy of
>a FINITE string depends on the universal Turing machine you use to run
>your programs.   Which UTM do you propose to use for DNA and why
>not some other?   

It is true that H(s) for finite sequences is only defined up to a constant,
depending on the choice of universal machine.
However, I don't believe Shannon's theory is superior in this regard,
since it is dealing with probabilities which are -- or could be --
defined as limits of ratios as the number of cases considered tends to infinity.

In fact Chaitin is more precise.
In effect Shannon's theory tells us H(s) to within o(|s|),
ie within a quantity that gets smaller faster than the length of the string s.
It may be possible to improve this, say to o(log |s|),
but not to the constant that Chaitin's theory gives.

Incidentally, it seems to me to be an interesting question to ask
if one _could_ choose a particular universal machine
by some objective criteria.
It doesn't seem to me obvious that one can or that one can't.
[It wouldn't matter if the criteria gave a finite number of machines,
rather than just 1.]

However, it seems to me that we have each expressed out points of view
fairly clearly.
Time will tell if Chaitin's notion comes to displace that of Shannon.

-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Thu Feb 16 22:00:00 1995
Path: biosci!adam.cc.sunysb.edu!news.nysernet.net!news.sprintlink.net!pipex!sunsite.doc.ic.ac.uk!daresbury!trane.uninett.no!sunic!news.uni-c.dk!find2.denet.dk!edb-ht
From: edb-ht@find2.denet.dk (H Thygesen)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 17 Feb 95 16:27:49 GMT
Organization: News Server at UNI-C, Danish Computing Centre for Research and Education.
Lines: 23
Message-ID: <edb-ht.793038469@find2.denet.dk>
References: <3i24uq$53f@hamilton.maths.tcd.ie> <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br>
NNTP-Posting-Host: find2.denet.dk

szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:

>	Hi All,
>	There is one point that does not permits me fully accept information 
>theory. The point is if information has any physical background or it's 
>just a manipulation of ideas not directly related to nature. In the later 
>case, why use different names to the same quantity (entropy) ?
>	Hope my precupation is not to naive.

>				Rafael.

There are at least 100 different answers to that question. Here comes one of them:

Within probability theory, we destinguish between two schools: The subjecti-
vists that claim that probability only reflect human beings lack of ability
to predict the future (because we don't have access to all information),
and the objectivists that claim that it makes sense to speak about the 
"true" probability that something happens.

As entropy is (or at least: can be) defined in terms of probability distributions, an objectivist would see entropy and information as something physical,
whereas a subjectivist would not.

Helge

From owner-info-theory@net.bio.net Thu Feb 16 22:00:00 1995
Path: biosci!rutgers!uwm.edu!news.moneng.mei.com!howland.reston.ans.net!news.sprintlink.net!uunet!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Estimation of Entropies from a Finite Amount of Data
Date: 17 Feb 1995 19:44:54 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 121
Distribution: world
Message-ID: <3i2ubm$6ru@news.u.washington.edu>
References: <3i24uq$53f@hamilton.maths.tcd.ie>
NNTP-Posting-Host: escher.math.washington.edu
Keywords: Chaitin, Shannon, algorithmic complexity, law of iterated logarithm

In article <3i24uq$53f@hamilton.maths.tcd.ie>,
tim@maths.tcd.ie (Timothy Murphy) writes:

|> hillman@math.washington.edu (Christopher Hillman) writes:
|> 
|> >|> >As I mentioned
|> >|> >in a previous post, Thomas and Cover do discuss in the book cited below
|> >|> >some simple upper bounds for Chaitin's entropy,
|> >|> >but there is no way to obtain an upper bound as close as desired to
|> >|> >the actual value.   
|> >|> 
|> >|> This isn't really relevant,
|> 
|> >I think the question of whether or not you can control the error is ESSENTIAL,
|> >as is usually true in doing hard analysis or statistical estimation.
|> 
|> >|> but I read Thomas & Cover after you recommended it,
|> >|> and wasn't too happy at the treatment of Chaitin entropy.
|> 
|> Just to be clear.
|> I meant that _my_ following remark was probably not relevant,
|> not _your_ reference to Thomas & Cover !

OK, I misunderstood this.
 
|> >You overlooked one of my most important objections: the Chaitin entropy of
|> >a FINITE string depends on the universal Turing machine you use to run
|> >your programs.   Which UTM do you propose to use for DNA and why
|> >not some other?   
|> 
|> It is true that H(s) for finite sequences is only defined up to a constant,
|> depending on the choice of universal machine.
|> However, I don't believe Shannon's theory is superior in this regard,
|> since it is dealing with probabilities which are -- or could be --
|> defined as limits of ratios as the number of cases considered tends to infinity.

I don't understand how this relates to the fact that the Chaitin complexity K(s)
of finite strings s is only defined up to a BOUNDED FUNCTION g(s), in this
sense: if you compare the complexities defined relative to two different
universal Turing Machines, say K_U(s) and K_V(s), then

       |K_U(s) - K_V(s)| \leq g(s)

where g(s) is a non-negative bounded function on strings (depending on U,V).
I am pretty sure that g is itself an uncomputable function.  (Hopefully the
computer scientists will correct me if I am wrong about this.) This
would mean that the problem of imprecise definition independent of universal
Turing machine is much worse than if we had some constant C (depending on U,V)
such that
    
       K_U(s) = K_V(s) + C

(Note to other readers: this issue is discussed in Thomas & Cover's book.)
 
|> In fact Chaitin is more precise.

I think that the matter of error control in estimating these two entropies
from data (the first n terms of a sequence we assume is a typical sequence
produced by an unknown message source) shows that it is more accurate to
say that Shannon's theory is more useful in practice, because it is
inherently impossible to estimate Chaitin entropies accurately, but by
taking enough terms one can estimate Shannon entropies as closely as desired
(always assuming that the sequence is indeed typical, but this seems very
reasonable since this event has probability one).

Technical note: I should confess that the situation with regard to Shannon
entropies (estimating probabilities from empirical frequencies computed
from a finite number of terms) is not quite as good as my statements might
seem to imply.  Complete error control means that you can make a GENERAL
statement about the SPEED of convergence as the number of terms increases,
but it is known that no such statement can be made for the convergence
of empirical frequencies to probabilities which is implied by
Birkhoff's ergodic theorem.   See the book

   Karl Petersen, Ergodic theory, Cambridge University Press, 1983

for a detailed discussion.  Nevertheless, the fact that convergence 
occurs at all for probabilities shows that Shannon's entropy is
inherently ``more estimatible from data'' than is Chaitin's entropy.

|> In effect Shannon's theory tells us H(s) to within o(|s|),
|> ie within a quantity that gets smaller faster than the length of the string s.
|> It may be possible to improve this, say to o(log |s|),

You might look at Petersen's discussion of the law of the iterated logarithm,
which holds for Bernoulli systems.   This is the best possible result
and it gives an error bound of form O(sqrt{|s| log log |s|}), with probability
one.

|> but not to the constant that Chaitin's theory gives.

Could you explain what constant you are talking about here?
 
|> Incidentally, it seems to me to be an interesting question to ask
|> if one _could_ choose a particular universal machine
|> by some objective criteria.
|> It doesn't seem to me obvious that one can or that one can't.
|> [It wouldn't matter if the criteria gave a finite number of machines,
|> rather than just 1.]

It is possible to completely specify the design of a universal Turing machine,
and more than one design is possible.  Nothing essential is lost if you think
of universal Turing machines as digital computers; now it is obvious that
the complexities defined by an IBM PC and a Macintosh will disagree in some
complicated (but bounded) way, and it is also obvious that there is no
theoretical obstruction specifying a universal Turing machine (although
in practice it is not easy unless you are truly expert at designing digital
computers!).
 
|> However, it seems to me that we have each expressed out points of view
|> fairly clearly.

I think I now understand your thinking better, but as noted above there are
still some points which remain unclear, at least to me.

|> Time will tell if Chaitin's notion comes to displace that of Shannon.
 
Can I take this to mean you now agree they are distinct (but related)
concepts?  :)

Chris Hillman

From owner-info-theory@net.bio.net Fri Feb 17 22:00:00 1995
Path: biosci!VOSCC.NAGAOKAUT.AC.JP!kmatsuno
From: kmatsuno@VOSCC.NAGAOKAUT.AC.JP (koichiro matsuno/7129)
Newsgroups: bionet.info-theory
Subject: Re: Yockey Definitions
Date: 18 Feb 1995 05:04:33 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 57
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9502181308.AA00134@voscc.nagaokaut.ac.jp>
NNTP-Posting-Host: net.bio.net

In article <3i00ti$it0@newsbf02.news.aol.com> hpyockey@aol.com (HPYockey)
wrties:

>What did you think of my comments on the origin of life and the
>self-organization of Manfred Eigen? That is a lot more controversial than
>the Shannon-McMillan Theorem.

   May I jump in here? Yockey made the same point almost fifteen years ago 
and said that "The argument made in all origin of life scenarios is that one 
can trace the origin of life by going back in time along evolutionary 
pathways to simpler and yet more simple organisms" (Yockey, H. P., 1981, 
J. Theor. Biol. 91, 13). I remember this statement because Sidney Fox and 
myself quoted it in our JTB paper (1983, J. Theor. Biol. 101, 321) while 
responding to his complaint about all self-organization scenarios. But, I 
am not going to repeat the same old stroy again ;-). My intention is quite 
the opposite. 

   Yockey's emphasis on the Shannon-McMillan theorem is sound and 
unquestionable. Its corollary, however, is that information thus conceived
is synchronic in the sense that the source matrix of information or how to 
partition the probability space, once fixed, remains invariant in time. 
This point would become most acute if an evolutionary historical
development is the case. If an invariant ensemble of evolutionary _histories_
is available in a synchronic and time-independent manner, the high 
probability sequences would certainly tell us something very significant.
In contrast, if only a single sequence of historical development is 
available as the evolutionary process proceeding on our earth, synchronic 
information would not be of much use even if it is legitimate. Surely, an
uneasiness with synchronic information is visible in the article <3glqca$pof@
mserv1.dl.ac.uk> of Avi Elitzur <CFELI@WEIZMANN.weizmann.ac.il>:

>The impression of information theory emerging from Yockey's 
>book is that of a purely technical tool, hardly interesting for the
>biologist.  While offering detailed analyses of DNA sequences, Yockey
>dismisses any attempt to go beyond that.  

   Information referred to a single sequence of evolutionary development is 
diachronic and cannot enjoy the conceptual rigor that synchronic 
information takes for granted. What synchronic information is to entropy to
the invariant probability space, that is diachronic information to meaning to 
the sequence of experience. Yockey has been most keen in warning against
muddling both synchronic and diachronic information in an unprincipled 
manner. We should listen to him at this point. At the same time, diachronic
information waits a new challenge because our experiences in general and
evolutionary processes in particular are simply diachronic, though a lot of 
seemingly synchronic pieces could be available if cut arbitrarily.

   Regards,
   Koichiro

   Koichiro Matsuno
   Department of BioEngineering 
   Nagaoka Uniersity of Technology
   Nagaoka 940-21, Japan
 
   kmatsuno@voscc.nagaokaut.ac.jp


From owner-info-theory@net.bio.net Fri Feb 17 22:00:00 1995
Path: biosci!agate!sunsite.doc.ic.ac.uk!doc.news.pipex.net!pipex!uknet!daresbury!not-for-mail
From: gad yagil <LCYAGIL@WEIZMANN.weizmann.ac.il>
Newsgroups: bionet.info-theory
Subject: telomeres
Date: 18 Feb 1995 22:14:16 -0000
Lines: 89
Sender: lpddist@mserv1.dl.ac.uk
Distribution: bionet
Message-ID: <3i5rfo$crj@mserv1.dl.ac.uk>
X-Acknowledge-To: <LCYAGIL@WEIZMANN>
Original-To: bioinfonetters <bio-info@dl.ac.uk>


Tim Murphy writes:
| My question is simply whether H(s) where s is the DNA string
| could make a sensible definition of the "complexity" of a creature.

Tom Schneider answers:
>I believe that that is approaching the problem in a way
> that is fruitless. Lots has been written about it and no measurements
> and no conceptual advances  have been made!  We already know how
> much DNA there is, and CoT curves of the "complexity" were done 30
> years > ago.  It doesn't teach us anything new.
 (If you don't believe that, > then tell > me something new!)

Here is an example which may teach us something new:

These are the first 363 bases of the complete nucleotide sequence of
yeast chromosome III:  (The end regions of a chromosome are called
telomeres):

~      1  CCCACACACC ACACCCACAC CACACCCACA CACCACACAC ACCACACCCA
      51  CACACCCACA CCACACCACA CCCACACCAC ACCCACACAC CCACACCCAC
     101  ACACCACACC CACACACACC ACACCCACAC ACACCCACAC CCACACACCA
     151  CACCCACACA CACACCACAC CCACACACAC CACACCACAC CCACACCACA
     201  CCCACACCCA CACACCACAC CCACACCCAC ACCCCACACC CACACACCAC
     251  ACCCACACAC ACCACACCCA CACACACCCA CACCACACCC ACACACCACA
     301  CCCACACACC CACACCCACA CACACCACAC CCACACCACA CCCACACCCA
     351  CACACCCACA CCC (TAACAC....

This terlomeric region, like other telomeric regions sofar, is
dominated by a single short motif: CCACAC. It is clearly less complex
then most of genomic DNA. Here is the shortest program I managed to
formulate today for the sequence:

B = CCACAC           6   (A is adenine, C is Cytosine;
D = CCCACAC          7    The complementary strand is
E = ACA              3    naturally all G,T).
H = CACA  (=CE)      2
P = BB               2

1         DAPHDAEPADHPHDADBAPEPEDBAPAHPEBHDH    208
209       DBABPPAPEPEABHDAPABDEPHDBADCC         363

Substitution of the 5 definitions above into this 63 letter string yields
uniquely the complete telomere sequence. This string is probably not the
shortest description of the sequence. The length of the algorithm is 83 steps -
 63 toprint the string and 20 more algorithmic steps are involved in
the five definitions listed. This yields a relative complexity of
83/363 = 0.228. I challenge the mathematicians
on this net to find and proove (if proofable) a "super"algorithm to
determine the real shortest program.

The relative low complexity of the telomeric region is really no
surprise. it is long known that telomeres are replicated not by the DNA
polymerase which replicates the rest of the chromosome, but by a special
enzyme called telomerase, which uses a short RNA template (Possibly
CCACAC in yeast) to create this sequence, it does it with a considerable
amount of yet not completely  understood "stuttering".

Nevertheless, a complexity analysis based on the Kolmogorov-Chaitin
concept may well be of predictive value for non conventionally
replicated DNA regions - telomers and other yet to be discovered regions.

I hope this satisfies Tom Schneider that algorithmic complexity has
a place in theoretical molecular biology, and that  the position taken
by Tim Murphy is a legimate and well taken one.

May I draw the attention of all of you to a series of papers on
complexity analysis,not of sequences,but of real molecular structures,
which I realize today to be concetually applications of the Kolmogorov
complexity concept:

Yagil, G. (1985). On the structural complexity of simple biosystems.
J. Theor. Biol., 112: 1-23.

Yagil, G. (1993a). Complexity analysis of a protein molecule. In:
Mathematics applied to Biology and medicine,  J. Demongeot and V.
Capasso, Eds., Wuerz plishing, pp. 305 - 313

Yagil, G. (1993b). On the structural complexity of templated systems.
In: 1992 lectures in complex systems" L. Nadel and D. Stein, Eds. The
Santa Fe Institute and Addison-Wesley, New York.

I shall be glad to provide reprints to those who send me their snail-mail
addresses.

Gad yagil
Dept. of Cell Biology,
The Weizmann Institute of Science, Rehovot,IL
lcyagil@weizmann.weizmann.ac.il

From owner-info-theory@net.bio.net Fri Feb 17 22:00:00 1995
Path: biosci!agate!library.ucla.edu!csulb.edu!nic-nac.CSU.net!charnel.ecst.csuchico.edu!olivea!spool.mu.edu!howland.reston.ans.net!Germany.EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Estimation of Entropies from a Finite Amount of Data
Date: 18 Feb 1995 18:37:19 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 95
Message-ID: <3i5eov$c0s@hamilton.maths.tcd.ie>
References: <3i24uq$53f@hamilton.maths.tcd.ie> <3i2ubm$6ru@news.u.washington.edu>
NNTP-Posting-Host: hamilton.maths.tcd.ie
Keywords: Chaitin, Shannon, algorithmic complexity, law of iterated logarithm

hillman@math.washington.edu (Christopher Hillman) writes:

>I don't understand how this relates to the fact that the Chaitin complexity K(s)
>of finite strings s is only defined up to a BOUNDED FUNCTION g(s), in this
>sense: if you compare the complexities defined relative to two different
>universal Turing Machines, say K_U(s) and K_V(s), then

>       |K_U(s) - K_V(s)| \leq g(s)

>where g(s) is a non-negative bounded function on strings (depending on U,V).
>I am pretty sure that g is itself an uncomputable function.  (Hopefully the
>computer scientists will correct me if I am wrong about this.) 

Well, I'm not a computer scientist,
but I'm sure this is wrong.
In fact

       |K_U(s) - K_V(s)| \leq C

where C = C(U,V) is a constant.
By definition, U can emulate V, ie there is a prefix u such that

	U(up) = V(p)

for all p. It follows that

	K_U(s) \le K_V(s) + |u|,

where |u| is the length of the string u;
and similarly with U,V interchanged.

>|> Incidentally, it seems to me to be an interesting question to ask
>|> if one _could_ choose a particular universal machine
>|> by some objective criteria.
>|> It doesn't seem to me obvious that one can or that one can't.
>|> [It wouldn't matter if the criteria gave a finite number of machines,
>|> rather than just 1.]

>It is possible to completely specify the design of a universal Turing machine,
>and more than one design is possible.  Nothing essential is lost if you think
>of universal Turing machines as digital computers; now it is obvious that
>the complexities defined by an IBM PC and a Macintosh will disagree in some
>complicated (but bounded) way, and it is also obvious that there is no
>theoretical obstruction specifying a universal Turing machine (although
>in practice it is not easy unless you are truly expert at designing digital
>computers!).

I didn't express my question clearly enough.
Suppose one regards a Turing machine T as a black box,
determined only by the function T(p) it defines.
One could define the "distance" between 2 universal machines U,V
by the constant C above, say d(U,V) = C.
It might be that one could find a machine U
for which let us say \sum_V d(U,V)^{-3} was a minimum.
I would regard that as an objective criterion for choosing U.

On the other hand, their might exist some kind of "distance-preserving"
transformation among universal machines
which would make it clear that there were an infinite number of U
minimising the sum above.

This is probably a rather technical question,
and not really relevant to the point at issue.

>|> Time will tell if Chaitin's notion comes to displace that of Shannon.
> 
>Can I take this to mean you now agree they are distinct (but related)
>concepts?  :)

Well, I would see quite a close parallel with the earlier development
of the concept of entropy in thermodynamics.

Originally, it was introduced as a "thermodynamic potential" --
the integral of a quantity that arose in the study of gases.

Later, this was given an interpretation through statistical mechanics
in terms of the motions of molecules.

But later again, it seemed that the thermodynamic notion extended
to cases which were not covered by statistical mechanics
(eg solids).

My impression is that most mathematical physicists would say
that there was a quantity called entropy
which could be interpreted statistically in many cases.

I would feel more or less the same about Chaitin's definition
vis-a-vis that of Shannon.


-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Fri Feb 17 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!uunet!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Does Adleman's Procedure Really Perform a Digital Computation?
Date: 18 Feb 1995 04:52:28 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 66
Distribution: world
Message-ID: <3i3uec$bh5@news.u.washington.edu>
NNTP-Posting-Host: escher.math.washington.edu
Keywords: DNA soup, wet computing, universal computer

In my opinion, yes.  For a contrary view, I repost by permission the
following:
----------------------------------------------------------------------------

Newsgroups: sci.math.research
From: traverso@barsotti.dm.unipi.it (Carlo Traverso)
Subject: Re: Q: ref for DNA used to solve math. network (lin. alg.) problem
Message-ID: <TRAVERSO.95Feb17114914@pappo.posso.dm.unipi.it>
Organization: Universita' di Pisa
Date: Fri, 17 Feb 1995 10:49:14 GMT
Approved: Greg Kuperberg <greg@math.uchicago.edu>, moderator for sci.math.research

[Mod note:  This discussion seems to have turned to popular mathematics,
so I am directing followups to sci.math.  -Greg]

>>>>> "Matthew" == Matthew P Wiener <weemba@sagi.wistar.upenn.edu> writes:
In article <3htoe1$i22@netnews.upenn.edu>
weemba@sagi.wistar.upenn.edu (Matthew P Wiener) writes:


    Matthew> In article <famulus-1302951122520001@ppp-82-25.bu.edu>,
    >> famulus@acs (Alex Kasman) writes:
    >> Consequently, one could continuously measure the temperature of an
    >> actual yam and thereby solve one particular differential equation.
    >> Still, nobody would claim that this demonstrates the feasibility of
    >> solving differential equations using yams.

    Matthew> Sure they would.  Analog computing has a long and respectable history,
    Matthew> which has mostly vanished thanks to digital computers.
    Matthew> -- 
    Matthew> -Matthew P Wiener (weemba@sagi.wistar.upenn.edu)


In my opinion, a computational PROOF requires an exactly reproducible
COMPUTATION. Numerical computations on a digital computer are exactly
reproducible, while analog computing can at most give a probabilistic
answer. Hence a "proof" relying on an analog computing is at most as
valid as a proof that uses a monte-carlo algorithm; and I believe that
nonody can assert that in that case we have a valid mathematical proof
(at most one would say that the truth of the result has a very high
probability - whatever this means). 
Of course, a numerical computation too cannot provide a proof, unless
an error analysis with an exact proof concludes that the computation
is reliable.

One can argue that even digital computing is probabilistic:
apart from bugs (hardware or software), a computation may be
influenced by rare electrical phenomena, (as well as an human proof
can have bugs, and a printed 3 can easily become an 8). But it is the
EXACT reproducibility that makes the difference.

A more delicate issue is a Las Vegas proof: an algorithm relying on a
choice of a random element, and such that the algorithm can either
give a result (exact) or fail, with predetermined probability. 
In that case I would say that the complete proof should include the
random element choice, and not be limited to the algorithm and to an
assertion that the computation succeeded. 

Carlo Traverso
Dept. of Mathematics
Pisa - Italy 



 


From owner-info-theory@net.bio.net Sat Feb 18 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!spool.mu.edu!uwm.edu!psuvax1!news.pop.psu.edu!hudson.lm.com!godot.cc.duq.edu!nntp.club.cc.cmu.edu!casaba.srv.cs.cmu.edu!das-news2.harvard.edu!fas-news.harvard.edu!fas!berriz
From: Gabriel Berriz <berriz@husc.harvard.edu>
Newsgroups: bionet.info-theory
Subject: Ref. needed: state of the art in info th & mol bio
Date: 19 Feb 1995 14:22:35 GMT
Organization: Harvard University, Cambridge, Massachusetts
Lines: 16
Message-ID: <3i7k7b$s4q@decaxp.harvard.edu>
Reply-To: berriz@husc.harvard.edu
NNTP-Posting-Host: fas-2.harvard.edu
X-Newsreader: NN version 6.5.0 #3 (NOV)
Originator: berriz@fas




Hello.  I would like to read up on the current state of the art of
molecular biological applications of information theory.  What are the
areas of intense current research?  What do the experts in the field
consider to be the more important unsolved problems?  I am
particularly interested in learning about research efforts which aim
towards practical applications or which at least have an experimental
component (i.e. I am, at the moment, less interested in "purely
theoretical" questions).  Can someone point me to a good review
article?

Many thanks in advance.

Gabriel Berriz

From owner-info-theory@net.bio.net Sat Feb 18 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!bloom-beacon.mit.edu!spool.mu.edu!torn!alf.uwaterloo.ca!watdragon.uwaterloo.ca!daisy.uwaterloo.ca!tromp
From: tromp@daisy.uwaterloo.ca (John Tromp)
Subject: Re: Estimation of Entropies from a Finite Amount of Data
Message-ID: <D49oF1.4HD@watdragon.uwaterloo.ca>
Keywords: Chaitin, Shannon, algorithmic complexity, law of iterated logarithm
Sender: news@watdragon.uwaterloo.ca (USENET News System)
Nntp-Posting-Host: daisy.uwaterloo.ca
Organization: University of Waterloo
References: <3i24uq$53f@hamilton.maths.tcd.ie> <3i2ubm$6ru@news.u.washington.edu> <3i5eov$c0s@hamilton.maths.tcd.ie>
Date: Sun, 19 Feb 1995 21:45:00 GMT
Lines: 56

In article <3i5eov$c0s@hamilton.maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes:
> hillman@math.washington.edu (Christopher Hillman) writes:
> 
> >I don't understand how this relates to the fact that the Chaitin complexity K(s)
> >of finite strings s is only defined up to a BOUNDED FUNCTION g(s), in this
> >sense: if you compare the complexities defined relative to two different
> >universal Turing Machines, say K_U(s) and K_V(s), then
> 
> >       |K_U(s) - K_V(s)| \leq g(s)
> 
> >where g(s) is a non-negative bounded function on strings (depending on U,V).
> >I am pretty sure that g is itself an uncomputable function.  (Hopefully the
> >computer scientists will correct me if I am wrong about this.) 
> 
> Well, I'm not a computer scientist,
> but I'm sure this is wrong.
> In fact
> 
>        |K_U(s) - K_V(s)| \leq C

I think the confusion stems from the observation that the \leq above
was meant to be an equality in the subsequent remarks.
So one should define g as

   g(s) = |K_U(s) - K_V(s)|

Then g is bounded but possibly uncomputable.
Of course, it's easy to come up with choices of U and V such that g(s)
*is* computable, so the interesting question is if there are choices
of U and V where for which g(s) can be *proven* to be uncomputable.
This is indeed possible.
Let U be any universal TM.
V will halt only for inputs of the form x0 or x11.
In either case it will simulate U on x. Suppose U halts on x with output y.
Now if V's input was x11 it will just output y and halt. If, on the other
hand, V's input was x0, it will simulate U again on y. Then, if U halts,
it will output y and halt.
This pair of universal machines U,V gives rise to a function g(s) which
is always 1 or 2. Deciding whether g(s) = 1 is not possible though,
since this entails deciding whether s is a halting program.

> It might be that one could find a machine U
> for which let us say \sum_V d(U,V)^{-3} was a minimum.
> I would regard that as an objective criterion for choosing U.

Actually this approach doesn't work, since for any machine U
there will be infinitely many machines V that are functionally
identical to U, and so the above sum diverges.

regards,

%!PS			    %  -John Tromp (tromp@daisy.uwaterloo.ca)
42 42 scale 7 9 translate .07 setlinewidth .5 setgray/c{arc clip fill
setgray}def 1 0 0 42 1 0 c 0 1 1{0 3 3 90 270 arc 0 0 6 0 -3 3 90 270
arcn 270 90 c -2 2 4{-6 moveto 0 12 rlineto}for -5 2 5{-3 exch moveto
9 0 rlineto}for stroke 0 0 3 1 1 0 c 180 rotate initclip}for showpage

From owner-info-theory@net.bio.net Sat Feb 18 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!spool.mu.edu!howland.reston.ans.net!pipex!uknet!daresbury!not-for-mail
From: gad yagil <LCYAGIL@WEIZMANN.weizmann.ac.il>
Newsgroups: bionet.info-theory
Subject: complexity of (AG)n
Date: 19 Feb 1995 22:36:01 -0000
Lines: 39
Sender: lpddist@mserv1.dl.ac.uk
Distribution: bionet
Message-ID: <3i8h4h$998@mserv1.dl.ac.uk>
X-Acknowledge-To: <LCYAGIL@WEIZMANN>
Original-To: bio-info@dl.ac.uk, sre@al.cam.ac.uk



Sean Eddy writes:

> Take the DNA string "AGAGAGAGAGAGAG".  This is a pretty common sort of
> thing in a metazoan genome: reiterated simple sequence repeat,
> sometimes for hundreds or thousands of bases.

> Schneider would argue, a la Shannon, that there's roughly 2
> bits/position in this sequence. Seems that Shannon relative entropy is
> always going to be a safe upper bound. But I could restate that
> sequence as (AG)^7, and therefore communicate it in fewer bits than
> Shannon predicts.  This is Murphy's argument, ....................

> It seems
> entirely reasonable to consider algorithmic entropy measures --
> though, myself, I don't know how to begin calculating an algorithmic
> entropy for DNA sequence.  (Anyone?)

  Certainly. I happened to calculate the algorithmic complexity of
the very string you list - (AG)n, only with n=8 rather than n=7.
The calculation compares the complexity of (AG)8 with that of
GAAAAGGAAAGGGAGG, which has 16 symbols, 8AsS and 8Gs, like (AG)8
[and also the SAME Maxwell- Boltzmann  ENTROPY!] .

The details are in:
Yagil, G (1985), J. Theor, Biol. 112: 1-23.
The result is C=2 for (AG)8 and C=16 for the "random" string,
which reflects well the difference in their compositional complexity.
For the complete set of rules used for the calculation, and the procdure
employed, I suggest you consult the paper. As I have your address I shall
send you the paper. I shallbe glad to answer qerries, or to hear
criticism.

Yours,

Gad Yagil,
The Weizmann Institute, Rehovot, IL
LCYAGIL@WEIZMANN.WEIZMANN.AC.IL

From owner-info-theory@net.bio.net Sun Feb 19 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!swrinde!news.uh.edu!uuneo.neosoft.com!Starbase.NeoSoft.COM!mckee
From: mckee@starbase.neosoft.com (George McKee)
Newsgroups: bionet.info-theory
Subject: Re: Estimation of Entropies from a Finite Amount of Data
Date: 20 Feb 1995 16:24:23 GMT
Organization: NeoSoft Internet Services   +1 713 968 5800
Lines: 42
Distribution: world
Message-ID: <3iafnn$r0f@uuneo.neosoft.com>
References: <3i24uq$53f@hamilton.maths.tcd.ie> <3i2ubm$6ru@news.u.washington.edu>
NNTP-Posting-Host: starbase.neosoft.com
X-Newsreader: TIN [version 1.2 PL2]

I apologize for the attituding, but this has bothered me for a long time.

Christopher Hillman (hillman@math.washington.edu) wrote:
...
: It is possible to completely specify the design of a universal Turing machine,
: and more than one design is possible.  Nothing essential is lost if you think
: of universal Turing machines as digital computers; now it is obvious that
: the complexities defined by an IBM PC and a Macintosh will disagree in some
: complicated (but bounded) way, and it is also obvious that there is no
: theoretical obstruction specifying a universal Turing machine (although
: in practice it is not easy unless you are truly expert at designing digital
: computers!).

Congratulations! You have made contact with physical reality!  The next
steps (and they're biggies) involve recognizing that the laws of quantum
mechanics for solid-state doped silicon define a space of all possible
Macintoshes and PCs (and DEC Alphas and Crays and Intel Paragons, etc.).
These laws implicitly define a Universal Turing machine because they
can be used to build a traditional UTM like a PC.  Then for each function
or dataset you're interested in, these laws can be searched
(nondeterministically) to find the minimal piece of silicon capable
of producing that function or dataset.  Or without loss of generality,
the minimal structure of mass-energy to do it, thus permitting your
computer to include optical interconnects and suchlike.

The point here is that the laws at this physical level apply equally to
to non-silicon structures such as DNA and Proteins.  To actually compute
values for the total information in biological macromolecules in comparison
to the information in your PC you'll have to take a theoretical route
through quantum chemistry instead of solid-state physics.  But there is
a point of convergence that permits you to avoid choosing the state-
space of your UTM arbitrarily.   And surprisingly, you end up with a
measure of information with units, and for Shannon information these
should be the same as the units of entropy.  For Kolmogorov/Chaitin
information, I don't know what units you'll get.

	Cheers,
	- George McKee

--
Internet: mckee@neosoft.com
Voice: +1 713 890 8122

From owner-info-theory@net.bio.net Sun Feb 19 22:00:00 1995
Path: biosci!SNFMA1.IF.USP.BR!szeinfel
From: szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S)
Newsgroups: bionet.info-theory
Subject: Re: telomeres
Date: 20 Feb 1995 04:19:35 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 82
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <Pine.SUN.3.91.950220090759.2638C-100000@snfma1.if.usp.br>
References: <3i5rfo$crj@mserv1.dl.ac.uk>
NNTP-Posting-Host: net.bio.net

On 18 Feb 1995, gad yagil wrote:

> 
> Tim Murphy writes:
> | My question is simply whether H(s) where s is the DNA string
> | could make a sensible definition of the "complexity" of a creature.
> 
> Tom Schneider answers:
> >I believe that that is approaching the problem in a way
> > that is fruitless. Lots has been written about it and no measurements
> > and no conceptual advances  have been made!  We already know how
> > much DNA there is, and CoT curves of the "complexity" were done 30
> > years > ago.  It doesn't teach us anything new.
>  (If you don't believe that, > then tell > me something new!)
> 
> Here is an example which may teach us something new:
> 
> These are the first 363 bases of the complete nucleotide sequence of
> yeast chromosome III:  (The end regions of a chromosome are called
> telomeres):
> 
> ~      1  CCCACACACC ACACCCACAC CACACCCACA CACCACACAC ACCACACCCA
>       51  CACACCCACA CCACACCACA CCCACACCAC ACCCACACAC CCACACCCAC
>      101  ACACCACACC CACACACACC ACACCCACAC ACACCCACAC CCACACACCA
>      151  CACCCACACA CACACCACAC CCACACACAC CACACCACAC CCACACCACA
>      201  CCCACACCCA CACACCACAC CCACACCCAC ACCCCACACC CACACACCAC
>      251  ACCCACACAC ACCACACCCA CACACACCCA CACCACACCC ACACACCACA
>      301  CCCACACACC CACACCCACA CACACCACAC CCACACCACA CCCACACCCA
>      351  CACACCCACA CCC (TAACAC....
> 
> This terlomeric region, like other telomeric regions sofar, is
> dominated by a single short motif: CCACAC. It is clearly less complex
> then most of genomic DNA. Here is the shortest program I managed to
> formulate today for the sequence:
> 
> B = CCACAC           6   (A is adenine, C is Cytosine;
> D = CCCACAC          7    The complementary strand is
> E = ACA              3    naturally all G,T).
> H = CACA  (=CE)      2
> P = BB               2
> 
> 1         DAPHDAEPADHPHDADBAPEPEDBAPAHPEBHDH    208
> 209       DBABPPAPEPEABHDAPABDEPHDBADCC         363
> 
> Substitution of the 5 definitions above into this 63 letter string yields
> uniquely the complete telomere sequence. This string is probably not the
> shortest description of the sequence. The length of the algorithm is 83 steps -
>  63 toprint the string and 20 more algorithmic steps are involved in
> the five definitions listed. This yields a relative complexity of
> 83/363 = 0.228. I challenge the mathematicians
> on this net to find and proove (if proofable) a "super"algorithm to
> determine the real shortest program.
> 

	This exemple is quite interesting since you have a sequence 
obviously organized so you expect it to contain a large amount of 
information (that may be seem to be directly proportional to complexity) 
but using this definition of complexity (the ratio between the minimal 
amount of data that uniquely describes your system and the amount of data 
actualy used to discribe it) you finds a small complexity ratio. 
	The point is that, if you can properly describe your system in a 
simple form, it's really complex or you were counting noise as 
information ?

> (rest of the message deleted)

> Gad yagil
> Dept. of Cell Biology,
> The Weizmann Institute of Science, Rehovot,IL
> lcyagil@weizmann.weizmann.ac.il
> 
> 
	Sincerly yours,
				rafael.
*---------------------------------------------------------------------*
* Rafael Iosef Najmanovich Szeinfeld | Depto. de Fisica Geral         *
* Statistical Mechanics Group        | Instituto de Fisica            *
* General Physics Deptartament       | Universidade Sao Paulo         *
* Physics Institute                  | Rua do Matao s/n CEP 01452-990 *     
* University of Sao Paulo - Brazil   | Caixa Postal 20516             * 
* E-mail: szeinfel@snfma1.if.usp.br  | Sao Paulo - Brasil             *
*---------------------------------------------------------------------*

From owner-info-theory@net.bio.net Sun Feb 19 22:00:00 1995
Path: biosci!galaxy.ucr.edu!ihnp4.ucsd.edu!swrinde!howland.reston.ans.net!Germany.EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Estimation of Entropies from a Finite Amount of Data
Date: 20 Feb 1995 03:03:25 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 25
Message-ID: <3i90pt$h31@hamilton.maths.tcd.ie>
References: <3i24uq$53f@hamilton.maths.tcd.ie> <3i2ubm$6ru@news.u.washington.edu> <3i5eov$c0s@hamilton.maths.tcd.ie> <D49oF1.4HD@watdragon.uwaterloo.ca>
NNTP-Posting-Host: hamilton.maths.tcd.ie
Keywords: Chaitin, Shannon, algorithmic complexity, law of iterated logarithm

tromp@daisy.uwaterloo.ca (John Tromp) writes:

>> It might be that one could find a machine U
>> for which let us say \sum_V d(U,V)^{-3} was a minimum.
>> I would regard that as an objective criterion for choosing U.

>Actually this approach doesn't work, since for any machine U
>there will be infinitely many machines V that are functionally
>identical to U, and so the above sum diverges.

I made it clear (or should have done)
that I was considering Turing machines as black boxes,
so that machines which are "functionally identical"
would be considered as one.
If you prefer, the sum is taken over classes
of functionally identical machines.

That said, the actual sum above was just pulled out of the air;
I have no idea if it actually converges.

-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Sun Feb 19 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!mojo.eng.umd.edu!cs.umd.edu!news.umbc.edu!haven.umd.edu!gamera.umd.edu!cbl.umd.edu!not-for-mail
From: ulan@cbl.umd.edu (Robert Ulanowicz)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 20 Feb 1995 09:45:04 -0500
Organization: Chesapeake Biological Laboratory
Lines: 59
Message-ID: <3ia9tg$g7d@cbl.umd.edu>
References: <3i24uq$53f@hamilton.maths.tcd.ie> <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br>
NNTP-Posting-Host: cbl.umd.edu

References: <3i24uq$53f@hamilton.maths.tcd.ie> <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br> <edb-ht.793038469@find2.denet.dk>

edb-ht@find2.denet.dk (H Thygesen) writes:


>There are at least 100 different answers to that question. Here comes one of them:

>Within probability theory, we destinguish between two schools: The subjecti-
>vists that claim that probability only reflect human beings lack of ability
>to predict the future (because we don't have access to all information),
>and the objectivists that claim that it makes sense to speak about the 
>"true" probability that something happens.

>As entropy is (or at least: can be) defined in terms of probability distributions, an objectivist would see entropy and information as something physical,
>whereas a subjectivist would not.

Helge is certainly right - there are at least a hundred answers to your 
question. I'll briefly add my 2 cents.

I believe inordinate confusion was caused by the names Claude Shannon gave
to concepts. (Don't get me wrong, I certainly admire Shannon's genius. I
just wish his chosen terminology had been less disasterous! :-)

For years I found it confusing that uncertainty and information seemed to 
be one and the same quantity. For that very reason, I avoided IT whenever 
possible. It wasn't until I realized somehow that information is always 
represented by a DIFFERENCE in uncertainties, that things began to make 
sense. (The Shannon "entropy" can be regarded as an implicit difference, 
whenever it is being used to refer to information. In it's "absolute" 
form it represents only uncertainty, NOT information.)

I used the term "entropy" most reluctantly. John von Neumann suggested the
term to Shannon as a joke. Shannon took him seriously, and we've had to
live with the fallout ever since! In a nutshell, Shannon's measure is
formally the same as Boltzmann's, but most people conveniently forget that
Boltzmann applied the formula only under certain constraints (conservation
of energy, ergodicity, etc.) that are NOT generally applicable. People see
the formal identity, and are induced (seduced?) by the name Shannon gave
his measure into thinking the two concepts are the same. The measure
itself is far more general than Boltzmann's application, and it, quite
simply, is a logical mistake (arguing from the particular to the general)
to invoke the phenomenological authority of the second law every time one
sees the formula!

Helge's point is also well taken. In this day and age of increasing
Postmodern subjectivity, the tendency as regards probabilities (and hence
information) is decidely countercurrent, i.e., in the direction of
objectification. To emphasize the latter point, John Collier bids us use
the term "indeterminacy" for Shannon's measure, rather than "uncertainty".
The use of "entropy", should be avoided insofar as possible! 

Cheers,
Bob Ulanowicz
ulan@cbl.umd.edu






From owner-info-theory@net.bio.net Sun Feb 19 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!adam.cc.sunysb.edu!news.nysernet.net!news.sprintlink.net!howland.reston.ans.net!torn!alf.uwaterloo.ca!watdragon.uwaterloo.ca!daisy.uwaterloo.ca!tromp
From: tromp@daisy.uwaterloo.ca (John Tromp)
Subject: Re: Estimation of Entropies from a Finite Amount of Data
Message-ID: <D4BH8L.5or@watdragon.uwaterloo.ca>
Keywords: Chaitin, Shannon, algorithmic complexity, law of iterated logarithm
Sender: news@watdragon.uwaterloo.ca (USENET News System)
Nntp-Posting-Host: daisy.uwaterloo.ca
Organization: University of Waterloo
References: <3i24uq$53f@hamilton.maths.tcd.ie> <3i2ubm$6ru@news.u.washington.edu> <3i5eov$c0s@hamilton.maths.tcd.ie> <D49oF1.4HD@watdragon.uwaterloo.ca> <3i90pt$h31@hamilton.maths.tcd.ie>
Date: Mon, 20 Feb 1995 21:05:08 GMT
Lines: 38

In article <3i90pt$h31@hamilton.maths.tcd.ie>, tim@maths.tcd.ie (Timothy Murphy) writes:
> tromp@daisy.uwaterloo.ca (John Tromp) writes:
> 
> >> It might be that one could find a machine U
> >> for which let us say \sum_V d(U,V)^{-3} was a minimum.
> >> I would regard that as an objective criterion for choosing U.
> 
> >Actually this approach doesn't work, since for any machine U
> >there will be infinitely many machines V that are functionally
> >identical to U, and so the above sum diverges.
> 
> I made it clear (or should have done)
> that I was considering Turing machines as black boxes,
> so that machines which are "functionally identical"
> would be considered as one.
> If you prefer, the sum is taken over classes
> of functionally identical machines.

That stilll doesn't solve the problem, since for any machine U, there
are infinitely many machines V_0, V_1, ... such that V_i is almost
functionally identical to U, and none of the V_i is functionally
identical to another V_j.

More precisely, V_i halts only on inputs of the form
x00 or x1. It simulate U on x, suppose U halts with output y.
If y <> i, it will output y and halt. If y=i though, it will only
output y and halt if the input was x00. Thus all strings except i will incur
a penalty of 1 relative to U, and the string i will incur a penalty of 2.

This is enough to make the above sum, or anything like it, diverge.

regards,

%!PS		    %  -John Tromp (http://daisy.uwaterloo.ca/~tromp)
42 42 scale 7 9 translate .07 setlinewidth .5 setgray/c{arc clip fill
setgray}def 1 0 0 42 1 0 c 0 1 1{0 3 3 90 270 arc 0 0 6 0 -3 3 90 270
arcn 270 90 c -2 2 4{-6 moveto 0 12 rlineto}for -5 2 5{-3 exch moveto
9 0 rlineto}for stroke 0 0 3 1 1 0 c 180 rotate initclip}for showpage

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular
Message-ID: <D49s4w.G9v@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
Date: Sun, 19 Feb 1995 23:05:20 GMT
Lines: 47

Sorry I didn't respond to this earlier.  Our news reader is on the blink again
and our systems administrators aren't keeping up with it.

| |> It is already known that the complexity (in the Britten and Kohne sense, which
| |> is really just an information measure!) of some organisms is higher than
| |> humans.  It's just another example to deflate our enlarged egos about our place
| |> in the universe.  ;-)
| 
| Any chance of your summarizing briefly the definition of this information
| measure?

The method is to extract DNA from an organism and (if I recall correctly) shear
it into pieces about 400 bases long.  Then the DNA is heated up to separate the
strands.  The DNA is then allowed to cool and strands will re-anneal to each
other.  The time that this process takes depends on the "complexity" of the
DNA.  The process can also be sped up by having a higher concentration of DNA.
So if the concentration (Co) goes up or the time (t) goes up, the annealing
goes faster.  People plot the percentage of reannealed DNA versus Co times t,
or Cot.  This is called a cot curve.  They look like this:

fraction not annealed
1 |*
  |   *
  |     *
  |      *
  |          *
0 |___________ cot

Annealing can be measured by any of several techniques like viscosity or
binding to hydroxyapitite (it may not be spelled that way).

For bacteria there is a single nice curve.  For higher organisms there are
lumps in the curve and this signalled the discovery of repeated units in the
DNA.  The place that the bulk of the curve drops is a measure of the
"complexity" of the organism, and you can read it off in base pairs.  That is,
it is probably an information measure, though I haven't proven that.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Newsgroups: bionet.info-theory,bionet.general
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Invention: The Care and Feeding of Ideas
Message-ID: <D47x3q.261@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
Date: Sat, 18 Feb 1995 22:57:26 GMT
Lines: 25
Xref: biosci bionet.info-theory:3152 bionet.general:13657

I just finished reading:

@book{Wiener1993,
author = "N. Wiener",
title = "Invention: The Care and Feeding of Ideas",
publisher = "The MIT Press",
address = "Cambridge, Mass.",
comment = "from a manuscript dated June 1954",
year = "1993"}

It is a wonderful book.  Though written 40 years ago, it is completely
contemporaneous, strongly supporting individual researchers in the face of
"megabuck science".  If every government representative were to read it and
follow it, there would be a great burst of basic science.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: DNA Programming
Message-ID: <D47wor.20x@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <famulus-1402950955540001@ppp-83-4.bu.edu> <D42sEn.AFD@ncifcrf.gov> <3hunuj$kbb@news.u.washington.edu>
Date: Sat, 18 Feb 1995 22:48:27 GMT
Lines: 49

In article <3hunuj$kbb@news.u.washington.edu> hillman@math.washington.edu
(Christopher Hillman) writes:

>In article <D42sEn.AFD@ncifcrf.gov>,
>toms@fcsparc6.ncifcrf.gov (Tom Schneider) writes:

| |> Part of the trouble is that using DNA alone, one has a hard
| |> time making nearly irreversible dissipations and so fixing the result is hard.
| |> Irreversible dissipations require more molecular machinery - like proteins that
| |> burn ATP.  That would be harder to build.
| 
| This is a good point.  What do you think of the idea of trying to build a
| ``biomolecular Turing machine'' consisting of an enzyme complex (=read/write
| head) working its way along a DNA chain (= tape)?  I would guess that
| there is no theoretical reason to rule such a device out, since this sounds
| so much like things that occur in bacteria every day, but that the technology
| to design such fine tuned enzyme complexes is decades in the future--- unless
| someone in a lab has a lucky break and finds a natural complex which ``almost
| works''; then it would probably be quickly modified to do just what we want.

That was suggested by Charles Bennett:

@article{Bennett1982,
author = "C. H. Bennett",
title = "The Thermodynamics of Computation - A Review",
journal = "Int. J. Theor. Phys.",
volume = "21",
pages = "905-940",
year = "1982"}

Unfortunately the design problem even for a single protein looks very
difficult.  I sometimes wonder if one could figure a selective scheme to create
the thing, but they are not obvious to me.  One problem of using natural
systems is that they synthesize in only one direction.  So you have only
copying/translation but no direct feedback.  There are little motors that run
both ways along microtubules, but they probably don't have multiple states.  It
looks like either it will be hard and take years or reasonably easy but have to
be done by a brilliant mind ...

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Path: biosci!ANS.NET!jta
From: jta@ANS.NET (John Amenyo)
Newsgroups: bionet.info-theory
Subject: Re: DNA Programming
Date: 21 Feb 1995 14:16:17 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 241
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <CMM.0.90.2.793404964.jta@foo.ans.net>
NNTP-Posting-Host: net.bio.net



> In article <famulus-1402950955540001@ppp-83-4.bu.edu>
> famulus@acs.bu.edu (Alex Kasman) writes:

> Some kind reader of this newsgroup has suggested that I post my
> (unpopular) opinions here to start an "interesting" discussion.  Let me
> start by saying:

Welcome aboard.

>
> 1) I am a mathematician by profession, but I am do know quite a bit about
> biology and have some training as a biologist.

I believe to have in depth understanding of "biocomputation" one also
needs to put on the "engineering" hat once in a while or  better yet,
have access to such expertise. Besides the "mathematics" and the "science",
a lot of "engineering" would be needed for "biocomputers" (of any kind)
to become viable, if ever. I have in mind the "engineering attitudes" of
such scientists and mathematicians like N. Wiener, J. von Neumann,
S. Ulam and C.E. Shannon.

> 2) I think making use of DNA for programming has tremendous potential
> (just look at the wonderful DNA programs that evolution has written), but
> (HERE IS THE UNPOPULAR PART) ...

> 3) I was not pleased with the recent "hype" concerning the work of Adelman
> [Science, November 1994]  which I am told has been receiving much
> discussion in this group.  (I haven't been reading.)

I suppose I am one of those Simplicios [apologies to Galileo]
who is unapologetically adding to the "hype". But that's alright - 
"our strength lies in our diversity and also our shared interests."


I suggest you DO take the time to read what the discussion has been in the
group. I believe some of your concerns/questions have been raised and
debated.

> The paper by Adelman is useful and interesting, and it has brought
> attention to the potential for developing molecular computing devices.
> However, the particular results seem to be a better example of
> "inverse" applied mathematics than of programming or the sort of
> molecular Turing machine which he mentions.

I believe your fundamental concern is about the following: 
what is a "computer" or "computation"?

From your statements/assertions, you don't seem to believe
that Adleman was demonstrating a "computation".


Mind you these kinds of debates never go very far, mainly because we humans
are typically dealing with "one-of-a-kind" phenomena in these situations.
They are also notoriously captured by the metaphor of the sub-continent
Indian story of the "The Elephant and the Twelve Blind Men".

For example:

	1. What is "life"?
		In the sense of is carbon-based "life" the only one possible?
		If landed a robot spacecraft on a planet in another solar
		system, would it (as our proxy) be able to "recognize"
		and appreciate "life" that is not carbon-based?

		Is the emerging research discipline of "artificial life"
		significant or just a scam for a few adherents to get money
		and other resources from funding institutions?

		Is the "search for extra-terrestial life" a delusion of 
		grand proportions?

	2. What is "intelligence"?
		In the sense of is non-human "intelligence" possible?
		Are there non-human animals (primates or other) that are
		"intelligent"?

		Is the decades old discipline of "artificial intelligence"
		significant or just a scam for a few adherents to get money
		and other resources from funding institutions?

		Is the search for "cetacean intelligence" one those directions
		of pseudo-scientific mumbo-jumbo?

	3. What is "civilized human culture"?
		Does it have to be based on "infrastructures" built using
		metal-based "technologies"?
		Does it have to include "writing" and "reading"?
		Can a human culture be justifiably "advanced" without
		"mathematics" and "science"? Or are all "non-technological"
		societies by definition "primitive" and "without culture
		at all"?

		Note that attitudes of exclusion/inclusion classification
		in this particular area of comparative ethnology
		have been responsible for countless human deaths and
		destruction in empire-building, colonization and apartheid.

		So this is not merely a question of philosophical spinning
		of wheels about categorization.

Now we can ask rather grandly:

	4. What is "computation"? What is a "computer"?
		What are the essential aspects of related terms such as:
		"computer", "calculation", "calculator",  "algorithm"
		"information" and "information processing"?
		What is "programming", a "procedure", "code", "routine",
		"software", "manipulation", "automation",
		"protocol" and "script"?

		What do "analog computer", "digital computer", 
		"electronic computer", "optical computer",  "molecular
		computer" and "bio-computer" have (or can have) in common?

		Can a "computer" be built from cups and beans?
		Take this seriously and try it!

		Does a "computation" have to (necessarily) involve 
		"mathematics", "numerical calculation"?  
		Or is a "computation" more akin to the "behaviour" or
		dynamical activity of a "formal system"?

		What is "mathematics"? Are there formal systems/schemes
		that are "non-mathematical"?  What about "pattern matching
		and processing" systems? What about "linguistic/language
		processing systems"?

		Are formal systems more related to "meta-mathematics" than
		to "mathematics"?

		Well, Horatio, there are more ...

My favourite book for these kinds of discussions is:

	I. Lakatos, "Proofs and Refutations".

I recommend it highly, it classifies and describes succintly the various
kinds of positions that the debaters  of these "angels-on-a-pin" issues
take.

> Note (added to news posting, not to letter): I realize that a calculator
> also is just a system whose behavior is the same as a mathematical system
> (arithmetic) and that this is why it is useful.  However, I think the
> significant thing is that, since we have developed circuits that are
> equivalent to the boolean operators and the functions of a Turing machine,
> we can make electrical circuits do anything (within reason).  On the other
> hand, I see nothing like this in Adelman's work; just the fact that DNA
> anneals only for particular base pairings.

Your statements about the absence of boolean circuits is actually very easy 
to answer. Implementation of Boolean (2-valued) logic not an essential
attribute of a (human-engineered) "computer".  Mind you, one could always
indulge in a Russellian scheme of translating any "computation" into Boolean
operations but this may not be useful to anybody.

It is only necessary that
a "computer" be an "interpreter" of "signs" of a "semiotic" scheme.

In one sense, one has a closed system of "objects"/"entities" together
with "operations"/"manipulations" on them. Some of these operations can
be interpreted as attributes/properties/valuations of the objects. Other
operations may be relations, including complex ones such as "protocols".
Some of the other operations may be mappings/translations/morphisms
(generally using the objects as "interpretants" or "representations"
other systems of objects). Even the material embodiment (mechanization or
implementation of) "abstract" or "universal algebras" will not be adequate 
mathematical models for a "computer".

If you are interested, there is substantial work on formal models
of "computation" available from the works of A. Turing (Turing machines),
A. Church (lambda-calculus), A. Markov (algorithms), E. Post (production
systems), S. Kleene (finite state automata) and H. Curry (combinatory
logic). There are also the "analog computer" models discussed by
N. Weiner (cybernetics) and Vannevar Bush.

There are also more modern attempts at defining/modeling "parallel",
"concurrent" and "distributed" computers/computation.

It is quite easy to engineer computers that rely on non-boolean operations,
the so called multi-valued logic (including probabilistic,
variable-valued and fuzzy logic). As you may be aware, a "computer" does
not even have to do "numerical calculations".

Even more significantly, there are existing "computers" in which one would be
hard pressed to identify the embodiments of boolean operations in their
operations. Actually one would be hard pressed to identify their
"digital" aspects.
For example, "optical computers" performing (direct) Fourier transforms
or wavelet analysis for image/picture (signal) processing.

I have mentioned this before, as have others. It is actually very silly
to take a sophisticated system like the biological neuron and then try
to use it to build boolean logic gates/circuits, then build up the hierarchy
of ALU, MMU, FPU, RAM, ROM, etc., and eventually obtain a "traditional
computer". This the basis of my personal discomfort with the current published
approaches to "molecular electronics" and "nano-computers".

I believe it is also silly to take a sophisticated system like
DNA/enzyme interaction complexes to build boolean logic gates/circuits and
then build "computers" out of these.

Adleman's style of "computation" may be unfamiliar but that does not mean
it is/should be suspect. It remains to be seen if the specific/concrete
operations and activities that Adleman advocates will be viable in
(commercialized) computers or not. However, it is my belief that he has
demonstrate how to use the sophistication at the biological molecular
level as basic "computational primitives", instead of first building
boolean logic gates/circuits/automata out of them.

The expansion of concepts of (embodiment/mechanization of) "computation"
are not different from those that occur with the concepts of 
"number" (e.g., Kummer), "algebra" (e.g., Hamilton, Grassmann, W.J. Gibbs,
O. Heaviside), "geometry" and extending "calculus" to "fractional calculus".
There is also the interesting story of K. Menger's attempt to extend
(metric) "geometry" to "probabilistic geometry" after his encounter with
relativity theory.


I posted earlier to the group one speculation about the general plausible
architecture of an Adleman-style "biocomputer". I have cleaned it up 
a tiny bit and it is available via anon. ftp from:

	file://ftp.ans.net/pub/misc/biocomputer.txt

May I suggest you take a look. Maybe it would provide you with some
insight into a "computer" that has no Boolean logic/gate in sight.


					jtA

===
Dr. John-Thones Amenyo
ANS CO+RE Systems
100 Clearbrook  Road
Elmsford, NY 10523

Fax:	914-789-5307
Email:	jta@ans.net
===

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: physical meaning of information.
Message-ID: <D4BGoJ.AyM@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3i24uq$53f@hamilton.maths.tcd.ie> <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br>
Date: Mon, 20 Feb 1995 20:53:07 GMT
Lines: 36

In article <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br>
szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:

| 	There is one point that does not permits me fully accept information 
| theory. The point is if information has any physical background or it's 
| just a manipulation of ideas not directly related to nature. In the later 
| case, why use different names to the same quantity (entropy) ?

I'm going to assume you read my previous postings about information being a
difference of uncertainties and uncertainty correlating to entropy (NOT
information correlating to entropy!!!).

The advantage of using information theory is that it brings us calculations
devoid of confusing issues of the physical implementation.  For example, by
working with sequence logos - which are an entirely symbolic method - I don't
have to be concerned with the actual binding energies between DNA and a
protein.  Having set that issue aside (for the moment) leaves the issue at hand
much more clear.  It is possible to do the same calculations with entropy or
probability, as we have discussed recently, but this clouds the fundamental
issues so much that most people get lost in the fog.  I recommend reading
Pierce1980.

| 	Hope my [preoccupation] is not to naive.

Nope.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Enthalpy and Entropy
Message-ID: <D49u80.GL7@ncifcrf.gov>
Summary: a quote
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
Date: Sun, 19 Feb 1995 23:50:24 GMT
Lines: 86

John Jungck (jungck@beloit.edu) sent a bunch of references to Ram Samudrala and
Ram kindly send them on to me.  They are such a wonderful set that I am asking
John to post his list here.  It turns out that there are a number of nice
papers on the concept of entropy in J. Chem. Ed. 47(5).

Below are the ones I located.  They intersect with John's list, but include
some he did not give.  Note the consecutive page numbers:

@article{Raman1970,
author = "V. V. Raman",
title = "Evolution of the {Second Law} of Thermodynamics",
journal = "J. Chem. Edu.",
volume = "47",
pages = "331-337",
year = "1970"}

@article{Bent1970,
author = "H. A. Bent",
title = "The {Second Law}---How Much, How Soon, to How Many?",
journal = "J. Chem. Edu.",
volume = "47",
pages = "337-341",
year = "1970"}

@article{Craig1970,
author = "N. C. Craig",
title = "Our Freshmen Like the {Second Law}",
journal = "J. Chem. Edu.",
volume = "47",
pages = "342-346",
year = "1970"}

@article{Strong.Halliwell1970,
author = "L. E. Strong
 and H. F. Halliwell",
title = "An Alternative to Free Energy for Undergraduate Instruction",
journal = "J. Chem. Edu.",
volume = "47",
pages = "347-352",
year = "1970"}

@article{Nash.chem1970,
author = "L. K. Nash",
title = "Chemical Equilibrium as a State of Maximal Entropy",
journal = "J. Chem. Edu.",
volume = "47",
pages = "353-357",
year = "1970"}

@article{Nash.rev1970,
author = "L. K. Nash",
title = "Reversible and Irreversible Heating and Cooling",
journal = "J. Chem. Edu.",
volume = "47",
pages = "357-361",
year = "1970"}

@article{Craig1988,
author = "N. C. Craig",
title = "Entropy Analysis of Four Familiar Processes",
journal = "J. Chem. Edu.",
volume = "65",
pages = "760-764",
year = "1988"}

I read Raman1970 last night and it had this wonderful quote (in TeX notation):

``Une Machine dont le calorique est le moteur, ne peut produire du travail sans
l'emploi de deux sources de chaleur \`{a} des temp\'{e}ratures
diff\`{e}rentes.'' --- Sadi Carnot, 1824

("... a machine in which caloric is the motor (i.e., a heat engine) cannot produce
work without the use of two sources of heat at different temperatures.")

"This was the very first formulation of the Second Law of Thermodynamics."

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!uunet!in1.uu.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: Estimation of Entropies from a Finite Amount of Data
Date: 21 Feb 1995 02:48:54 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 46
Distribution: world
Message-ID: <3ibkam$6qp@news.u.washington.edu>
References: <3iafnn$r0f@uuneo.neosoft.com>
NNTP-Posting-Host: escher.math.washington.edu

In article <3iafnn$r0f@uuneo.neosoft.com>,
mckee@starbase.neosoft.com (George McKee) writes:

|> Christopher Hillman (hillman@math.washington.edu) wrote:
|> ...
|> : It is possible to completely specify the design of a universal Turing machine,
|> : and more than one design is possible.  Nothing essential is lost if you think
|> : of universal Turing machines as digital computers; now it is obvious that
|> : the complexities defined by an IBM PC and a Macintosh will disagree in some
|> : complicated (but bounded) way, and it is also obvious that there is no
|> : theoretical obstruction specifying a universal Turing machine (although
|> : in practice it is not easy unless you are truly expert at designing digital
|> : computers!).
|> 
|> Congratulations! You have made contact with physical reality!  The next
|> steps (and they're biggies) involve recognizing that the laws of quantum
|> mechanics for solid-state doped silicon define a space of all possible
|> Macintoshes and PCs (and DEC Alphas and Crays and Intel Paragons, etc.).

This reminds me of a conversation I once had with a very well known quantum
chemist.  As I recall the conversation, we talked about indeterminacy in
the sense of Born--- quite a different meaning from the one you endorse; I tend
to agree that the using ``entropy'' to denote so many different quantities is
terribly confusing, but that all the alternative words are just as confusing!---
and he said something to the effect that anyone who says he understands QM is
either a liar or a fool.  I've read somewhere that the late Richard Feynman
was of a similar opinion.  From what I know of QM I'd have to agree with this
assessment, which makes me conjecture that the theory agrees with experiment
(so far) only by ``accident'', and that someday it will be replaced by a still
more sophisticated theory which does make sense.  Even if not, why should we use
such a ``crazy'' theory unless we absolutely need to?  Particle physicists and
quantum chemists put up with this theory because it's the only one which makes
accurate predictions in their area, but I think biologists should avoid invoking
it as far as possible, because of the uniquely problematic question of what the
heck QM could possibly ``mean''.

I doubt that it is neccessary to make contact with PHYSICAL reality--- at the
length and time scales forming the realm of QM--- in order to make contact with
BIOLOGICAL reality.  I've never seen any convincing argument that QM holds the
answer to the mysteries of biological complexity, development, etc... and I've
seen so many UN-convincing arguments to this effect that I'd prefer not to
encounter any more! :) 

Is this dictum an example of my ``attitude problem''? :)

Chris Hillman

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: How does this group get this stuff?
Message-ID: <D45rAG.ss@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <D3rGuK.9L2@world.std.com> <btluttbeg-100295165716@jimrice.ucdavis.edu> <D3u9np.Dt6@cunews.carleton.ca>
Date: Fri, 17 Feb 1995 18:56:39 GMT
Lines: 820

In article <D3u9np.Dt6@cunews.carleton.ca> etabello@arena.carleton.ca (Elias
Tabello) writes:

| Why not publish a FAQ and a short introductory note as many other
| newsgroups have done?

I have been posting FAQs for a long long time, near to the beginning of each
month.  It looks like they have to be every 2 weeks now because new people
don't look back more than that.  Here is the current FAQ.  Please note the
section about side discussions and inappropriate topics, entitled "What Can I
Do About Inappropriate Postings?".  Please let this be the last posting on this
topic.

From: toms@ncifcrf.gov (Tom Schneider)
Newsgroups: bionet.info-theory,news.answers
Subject: Biological Information Theory and Chowder Society FAQ
Summary: monthly Frequently Asked Questions posting for BITCS
 The news group bionet.info-theory is a forum for discussing information theory
 in biology and for tossing food for thought around.  Other interesting
 mathematical problems in biology are also welcome, as we will try our best to
 take the log of them, so as to convert them into information theory problems.
References:
Followup-To: bionet.info-theory
Distribution:
Organization: Frederick Cancer Research and Development Center
Keywords: FAQ, Biological Information Theory and Chowder Society
Approved: news-answers-request@MIT.Edu

Archive-name: biology/info-theory

***********************************************************

Replies to Frequently Asked Questions (FAQ) for bionet.info-theory

             Biological Information Theory and Chowder Society

version = 1.67 of bionet.info-theory.faq  1995 January 18

***********************************************************
- What is the History of The Biological Information Theory and Chowder Society?
- What Kind of Questions Are Appropriate For Discussion?
- How Can I Learn More About Information Theory and Biology?
- How Do I find Sequence Logos on the Web?
- Is There a Shell Script for Making Sequence Logos?
- Is There a Mosaic Page for Making Sequence Logos?
- Will Authors Send Me Papers?
- Can You Just Point My Mosaic To The FAQ and the Archives?
- How Do I obtain bionet.info-theory BY EMAIL?
- Where Did I Get This FAQ File From Originally?
- What is the IP number of the FAQ archive?
- Where Are the Bionet Archives?
- What Can I Do About Inappropriate Postings?
- What is the official word on copyright of this FAQ?
- Who Takes Care of This Group?
***********************************************************

* What is the History of The Biological Information Theory and Chowder Society?

The Biological Information Theory and Chowder Society (BITCS) is a group of
scientists interested in the biological applications of information theory
(thus the "BIT") who meet informally for dinner (thus the "CS") from time to
time in the Washington, DC, area.  At our dinners we have only one rule ---
food fights are discouraged.

The guys who started this thing did it because we weren't certain we understood
the biological implications of information theory.  Some of us are more
comfortable with the mathematical machinery and assemble biological systems
into grand canonical ensembles whether they want to be there or not; and some
of us think they understand what the biological systems are doing but can't
take a log to base 2.  What we try to do is pry from one another the bits of
knowledge that will help us understand what's going on.

Some of the topics up for discussion in our group are:
  biological applications of information theory
  biochemical molecular machines
  computer methods for recognition of molecular structure and function
  database organization for biomolecular information
  nanotechnology
  the limits of computation
  "dissipationless" (?) computation
  Maxwell's demon
  anecdotes and humor about all these topics
A few relevant papers are listed below.

The group started when Tom Schneider was introduced to John Spouge in 1988.
Tom bounced his ideas about molecular machines off John, and John kept finding
flaws.  Tom would go away rather unhappily for a month and then find a
solution.  But John was always one step ahead...  (and still is, on last
account.)  Tom gave a talk about molecular machines at the Lambda Lunch meeting
on the Bethesda NIH campus, and John introduced John (Steve) Garavelli.  We all
got together with Peter Basser for dinner once in a while to talk about
information theory.  Steve brought in one of the first people to apply
information theory to biology, Hubert Yockey.  Steve Garavelli dubbed the group
the "Biological Information Theory and Chowder Society", which it is still
called.  We are known sometimes as 'chowderheads', and talk about food fights,
but so far have only had electronic food fights!  We hold dinners in Bethesda
Maryland on random occasions.

When our informal mailing list became difficult to handle, we petitioned to
start a bionet news group.  We hope to hold roaring discussions, and everyone
is welcome to join.  If you are uncertain about something, quit lurking and ask
on the net.  It may well be that what bothered you is the key to a new piece of
information theory in biology.  (The major advances so far have been by things
that REALLY bugged people.)

We will also announce when and where our (irregular) eatings are and you are
welcome to join if the travel is not too far.  John Spouge
(spouge@ncbi.nlm.nih.gov), usually makes the arrangements.

***********************************************************

* What Kind of Questions Are Appropriate For Discussion?

This faq sheet answers simple questions about this group.  The BIG questions
should be discussed on the net, where we can all haggle over them.  Here are a
few for starters:

What is the role of theory in biology today?
What should be the role of biological theory?

What is information?  How should it be defined?

What bothers you when you read the two papers on the theory of molecular
machines?  (It is only from the things that bother us that we can make progress
in understanding.)  (See references below.)

What are flaws in the theory of molecular machines?

How is ATP used to drive molecular machines?

All communication systems are associated with living things, so is it true that
information theory is really a theory about living things?  Was Shannon really
a great biologist?

What does Maxwell's Demon have to do with all of this?

What are the limits of computers?

What are the limits of nanotechnology?

***********************************************************

* How Can I Learn More About Information Theory and Biology?

REFERENCES - General

There are a huge number of papers related to this topic, just about everything
in molecular biology, lots of chemistry, physics, electronics, evolutionary
theory, thermodynamics, statistical mechanics and the kitchen sink ...  You can
get a pretty good overview by combining the references of Schneider.ccmm,
Schneider.edmm and Leff1990.  References are given in BiBTeX format, the
bibliography program associated with LaTeX, the powerful and portable
typesetting program.

By arrangement, books that have prices listed can be ordered over Internet from:
  Reiter's Scientific & Professional Books
  2021 K Street, NW
  Washington, DC  20006
  1-800-537-4314
  1-202-223-3327
  1-202-296-9103 FAX
  books@reiters.com

Shipping and handling charges are:
in the DC metropolitan area $4.00 for one item, $0.50 for each additional item,
outside the area $4.50 for one item, $0.50 for each additional item.

The prices are current as of October 1994; because publishers are constantly
changing their prices, they should be considered estimates rather than
guaranteed prices.  To open an account you must first either phone or FAX them
and provide a credit card number.  Book orders can be then placed at any time
over the Internet.
        **DO NOT SEND CREDIT CARD NUMBERS OVER THE INTERNET!**

Reiter's carries all of the books on this list except "Information Theory:
Saving Bits", and that one can be special ordered.  If enough interest in this
book is generated by the FAQ, it will be added as regular stock.  (It can also
be ordered directly from the company using the information given.)

# Gonick's Wonderful books (Don't be shy!  They are worth the money!!):

@book{Gonick.computers,
author = "L. Gonick",
title = "The Cartoon Guide to Computers",
edition = "second",
publisher = "HarperCollins",
address = "New York, NY",
isbn = "0-06-273097-5",
price = "price as of 1994 October 31: \$11.00",
year = "1991"}

@book{Gonick.genetics,
author = "L. Gonick",
title = "The Cartoon Guide to Genetics",
edition = "updated",
publisher = "Barnes \& Nobel",
address = "New York, NY",
isbn = "0-06-273099-1",
price = "price as of 1994 October 31: \$12.00",
year = "1991"}

@book{Gonick.physics,
author = "L. Gonick
 and A. Huffman",
title = "The Cartoon Guide to Physics",
publisher = "HarperPerennial",
address = "New York, NY",
isbn = "0-06-273100-9",
price = "price as of 1994 October 31: \$12.00",
year = "1990"}

# A good starting point if you don't know much molecular biology:
# (Two volumes)

@book{Watson1987,
author = "J. D. Watson
 and N. H. Hopkins
 and J. W. Roberts
 and J. A. Steitz
 and A. M. Weiner",
title = "Molecular Biology of the Gene",
edition = "fourth",
publisher = "The Benjamin/Cummings Publishing Co., Inc.",
address = "Menlo Park, California",
isbn = "0-8053-9614-4",
price = "price as of 1994 October 31: \$59.95",
year = "1987"}

# This book describes LaTex and BiBTeX:

@book{Lamport1994,
author = "L. Lamport",
title = "\LaTeX: A Document Preparation System,
User's Guide \& Reference Manual",
edition = "second",
publisher = "Addison-Wesley Publishing Company",
address = "Reading, Massachusetts",
isbn = "0-201-52983-1",
price = "price as of 1994 October 31: \$32.95",
year = "1994"}

# ***********************************************************
# REFERENCES - Information Theory

# The best starter book:

@book{Pierce1980,
author = "J. R. Pierce",
title = "An Introduction to Information Theory:
Symbols, Signals and Noise",
edition = "second",
publisher = "Dover Publications, Inc.",
address = "New York",
isbn = "0-486-24061-4",
comment = "
original copyright 1961
Ordering information:  Pierce1980 is currently available by mail from:
   Dover Publications, Inc.
   31 East 2nd street
   Mineola, New York 11501
order:
   Pierce, An Introduction to Information Theory: Symbols, Signals and Noise
   code number: 24061-4
$7.95 + charges.  Payment in full, no telephone or credit card orders.
Postage and Handling charges are:
Bookrate: $3 (US only)
UPS: $4.50 (US only, not Alaska or Hawaii or PO boxes)
Foreign orders: add 20% of total (minimum $2.50)
Sales Tax (Ny residents only)
Foreign Orders Note: Remittances must be sent by international money order or
in U.S. funds via Federal Wire System to Chemical Bank, N. Y.  ABA #021000128.
Mark all remittances `For the account of Dover Publications, Inc.  #001 053
272'.  This information is from the Dover Math and Science Catalogue 9/92",
price = "price as of 1994 October 31: \$8.95",
year = "1980"}

Christopher Hillman (hillman@math.washington.edu) suggests that this one is a
better starting point:  Thomas Cover and Joy A. Thomas, Elements of Information
Theory, Wiley, 1991.  People who have seen both could post their opinions.

# A good introduction to the mathematics:

@book{Sacco1988,
author = "W. Sacco
 and W. Copes
 and C. Sloyer
 and R. Stark",
title = "Information Theory: Saving Bits",
publisher = "Janson Publications, Inc.",
comment = "original address was Providence, Rhode Island",
address = "Dedham, MA",
isbn = "0-939765-25-X",
phone = "(800) 322-6284",
price = "price as of 1994 October 31: \$11.95",
year = "1988"}

# Important originals:

@article{Shannon1948,
author = "C. E. Shannon",
title = "A Mathematical Theory of Communication",
year = "1948",
journal = "Bell System Tech. J.",
volume = "27",
pages = "379-423, 623-656"}

@book{ShannonWeaver1949,
author = "C. E. Shannon
 and W. Weaver",
title = "The Mathematical Theory of Communication",
publisher = "University of Illinois Press",
address = "Urbana",
isbn = "0-252-72548-4",
price = "price as of 1994 October 31: \$9.95",
year = "1949"}

@article{Shannon1949,
author = "C. E. Shannon",
title = "Communication in the Presence of Noise",
year = "1949",
journal = "Proc. IRE",
volume = "37",
pages = "10-21"}

# For the committed: The Complete Works!

@inproceedings{Shannon1993,
author = "C. E. Shannon",
editor = "N. J. A. Sloane and A. D. Wyner",
booktitle = "Claude Elwood Shannon: Collected Papers",
publisher = "IEEE Press",
address = "Piscataway, NJ",
isbn = "0-7803-0434-9",
comment = "IEEE Order Number: PC0331-9
  To order directly by charge card (eg Visa works) you can call
  (908)-981-0060
  $69.95 + $5 handling charge
  delivery in about 2 weeks",
price = "price as of 1994 October 31: \$69.95",
year = "1993"}

# How locks work and other cool stuff:

@book{Macaulay1988,
author = "D. Macaulay",
title = "The Way Things Work",
publisher = "Houghton Mifflin Company",
address = "Boston",
isbn = "0-395-42857-2",
price = "price as of 1994 October 31: \$29.95",
comment = "This book is also available on Windows-Compatible CD-ROM
  cdrom isbn = 1-56458-901-3  Price as of 1994 October 31: \$99.95",
year = "1988"}

# Leff1990 gives a review of the Maxwell's Demon problem.
# See also Schneider.edmm, listed below.

@book{Leff1990,
author = "H. S. Leff and A. F. Rex",
title = "Maxwell's Demon: Entropy, Information, Computing",
publisher = "Princeton University Press",
address = "Princeton, N. J.",
phone = "1(800) 777-4726",
isbn.hard = "0-691-08726-1 (hard cover)",
price.hard = "price as of 1994 October 31: \$80.00",
isbn.paper = "0-691-08727-X (paperback)",
price.paper = "price as of 1994 October 31: \$26.95",
year = "1990"}

# ***********************************************************

# REFERENCES - Jaynes

@article{JaynesI,
  author = "Edwin T. Jaynes",
  title = "Information Theory and Statistical Mechanics",
  year = 1957,
  journal = "Physical Review",
  volume = "106",
  pages = "620-630"}

@article{JaynesII,
  author = "Edwin T. Jaynes",
  title = "Information Theory and Statistical Mechanics. {II}",
  year = 1957,
  journal = "Physical Review",
  volume = "108",
  pages = "171-190"}

# A version of Jaynes' new book "PROBABILITY THEORY -- THE LOGIC OF SCIENCE"
# is available on the net.  See:
#
# ftp://bayes.wustl.edu/Jaynes.book/
# Larry Bretthorst (larry@bayes.wustl.edu)
#
# http://omega.albany.edu:8008/JaynesBook.html
# Carlos Rodriguez (carlos@math.albany.edu)
#
# Tom Schneider's pointers to these places:
# http://www.ncifcrf.gov:2001/~toms/Jaynes.html
#
# Note:  The book is being written now and new versions come out every once in a
# while.  One of these locations may be more up to date than the other.

# ***********************************************************
# REFERENCES - Schneider

@article{Schneider1986,
author = "T. D. Schneider
 and G. D. Stormo
 and L. Gold
 and A. Ehrenfeucht",
title = "Information content of binding sites on nucleotide sequences",
journal = "J. Mol. Biol.",
volume = "188",
pages = "415-431",
year = "1986"}

@inproceedings{Schneider1988,
author = "T. D. Schneider",
editor = "G. J. Erickson and C. R. Smith",
title = "Information and entropy of patterns in genetic switches",
booktitle = "Maximum-Entropy and Bayesian Methods in Science and Engineering",
volume = "2",
pages = "147-154",
publisher = "Kluwer Academic Publishers",
address = "Dordrecht, The Netherlands",
year = "1988"}

@article{Schneider1989,
author = "T. D. Schneider
 and G. D. Stormo",
title = "Excess Information at Bacteriophage {T7} Genomic Promoters
Detected by a Random Cloning Technique",
year = "1989",
journal = "Nucl. Acids Res.",
volume = "17",
pages = "659-674"}

@article{Schneider.Stephens.Logo,
author = "T. D. Schneider
 and R. M. Stephens",
title = "Sequence Logos: A New Way to Display Consensus Sequences",
journal = "Nucl. Acids Res.",
volume = "18",
pages = "6097-6100",
year = "1990"}

@article{Schneider.ccmm,
author = "T. D. Schneider",
title = "Theory of Molecular Machines.
{I. Channel} Capacity of Molecular Machines",
journal = "J. Theor. Biol.",
volume = "148",
number = "1",
pages = "83-123",
note = "{(Note: The figures were printed out of order!
Fig. 1 is on p. 97.)}",
year = 1991}

@article{Schneider.edmm,
author = "T. D. Schneider",
title = "Theory of Molecular Machines.
{II. Energy} Dissipation from Molecular Machines",
journal = "J. Theor. Biol.",
volume = "148",
number = "1",
pages = "125-137",
year = 1991}

@article{Herman.Schneider1992,
author = "N. D. Herman
  and T. D. Schneider",
title = "High Information Conservation Implies that at Least Three Proteins
Bind Independently to {F} Plasmid {{\em incD\/}} Repeats",
journal = "J. Bact.",
volume = "174",
pages = "3558-3560",
year = "1992"}

@article{Stephens.Schneider.Splice,
author = "R. M. Stephens
  and T. D. Schneider",
title = "Features of spliceosome evolution and function
inferred from an analysis of the information at human splice sites",
journal = "J. Mol. Biol.",
volume = "228",
pages = "1124-1136",
year = "1992"}

@article{Papp.helixrepa,
author = "P. P. Papp
 and D. K. Chattoraj
 and T. D. Schneider",
title = "Information Analysis of Sequences that Bind
the Replication Initiator {RepA}",
journal = "J. Mol. Biol.",
comment = "Cover of 233, number 2!",
volume = "233",
pages = "219-230",
year = "1993"}

@article{Schneider.nano2,
author = "T. D. Schneider",
title = "Sequence Logos, Machine/Channel Capacity,
{Maxwell}'s Demon, and Molecular Computers:
a Review of the Theory of Molecular Machines",
journal = "Nanotechnology",
volume = "5",
number = "1",
pages = "1-18",
year = "1994"}
ftp://ftp.ncifcrf.gov/pub/delila/nano2.ps

# ***********************************************************
# REFERENCES - Yockey

@book{Yockey1958a,
editor = "Hubert P. Yockey and Robert P. Platzman and Henry Quastler",
title = "Symposium on Information Theory in Biology",
booktitle = "Symposium on Information Theory in Biology",
publisher = "Pergamon Press",
address = "New York, London",
comment = "out of print",
year = "1958"}

@article{Yockey1981,
author = "Hubert P. Yockey",
year = 1981,
title = "Self-organization Origin of Life Scenarios and Information Theory",
journal = "J. Theor. Biol.",
volume = "91",
pages = "13-31"}

@book{Yockey1992,
author = "H. P. Yockey",
title = "Information Theory in Molecular Biology",
publisher = "Cambridge University Press",
address = "Cambridge",
isbn = "0-521-35005-0",
comment = "40 West 20th Street,
New York, N. Y.  10011-4211,
order number 350050",
phone = "1-800-827-7423",
price = "price as of 1994 October 31: \$74.95",
year = "1992"}

Following is Hubert Yockey's reference list:

Yockey, Hubert P. Information Theory and Molecular Biology, Cambridge UK:
Cambridge University Press (1992)
When is random random? Nature 344 (1990) p823, Hubert P. Yockey
Yockey, Hubert P. (1981). Self-organization origin of life scenarios and
information theory. Journal of Theoretical Biology, 91, 13-31.
Yockey, Hubert P. (1979). Do overlapping genes violoate molecular biology and
the theory of evolution? Journal of Theoretical Biology, 80, 21-26.
Yockey, Hubert P. (1978). Can the Central Dogma be derived from information
theory? Journal of Theoretical Biology, 74, 149-152.
Yockey, Hubert P. (1977a). A prescription which predicts functionally
equivalent residues at given sites in protein sequences. 67, 337-343.
Yockey, Hubert P. (1977b). On the information content of cytochrome c.
Journal of Theoretical Biology, 67, 345-376.
 Yockey, Hubert P. (1977c). A calculation of the probability of spontaneous
biogenesis by information theory. Journal of Theoretical Biology, 67,
377-398.
Yockey, Hubert P (1974). An application of information theory to the Central
Dogma and the sequence hypothesis. Journal of Theoretical Biology,.46,
369-406.
Yockey, Hubert P. (1960) The Use of Information Theory in Aging and Radiation
Damage In The Biology of Aging American Institute of Biological Sciences
Symposium No. 6 (160) pp338-347
Yockey, Hubert P., Platzman, Robert P. & Quastler, Henry, eds. (1958a).
Symposium on Information Theory in Biology, New York, London: Pergamon Press.
Yockey, Hubert P. (1958b). A study of aging, thermal killing and radiation
damage by information theory. In Symposium on Information Theory in Biology.
eds. Hubert P. Yockey, Robert Platzman & Henry Quastler, pp297-316. New York,
London: Pergamon Press.
Yockey, Hubert P. (1956). An application of information theory to the physics
of tissue damage. Radiation.Research, 5, 146-155.

***********************************************************

* Will Authors Send Me Papers?

Tom Schneider will mail you copies of his papers.  Send your physical address
to him at toms@ncifcrf.gov.  Some papers are on line already, see also the
README file in the ftp archive ftp.ncifcrf.gov in pub/delila.

If you are willing to send out papers or have papers you would like listed
here, please contact Tom Schneider.

You can request them by Mosaic from the page
http://www.ncifcrf.gov:2001/~toms/papers.html

***********************************************************

* How Do I find Sequence Logos on the Web?

http://www.ncifcrf.gov:2001/~toms/sequencelogo.html

***********************************************************

* Is There a Shell Script for Making Logos?

Yes, you will find the one Shmuel Pietrokovski wrote in the ftp archive
ftp.ncifcrf.gov in pub/delila/logoaid.  (Also available in
bioinformatics.weizmann.ac.il/pub/software/logoaid.)
***********************************************************

* Is There a Mosaic Page for Making Sequence Logos?

Yes, Steve Brenner has done it!

http://www.bio.cam.ac.uk/seqlogo/

***********************************************************

* Can You Just Point My Mosaic To The FAQ and the Archives?

This file and the postings may be obtained by Mosaic through the world wide web
at:

<UL>

<H2>
<LI> <A HREF="ftp://ftp.ncifcrf.gov/pub/delila/bionet.info-theory.faq.Z">
FAQ (Frequently Asked Questions)</A>
about the Biological Information Theory and Chowder Society
</H2>

<H2>
<LI> <A HREF="gopher://net.bio.net/11/BIO-INFO">
Gopher Link to Archive of All Postings.</A>
This archive contains the most recent postings
as separate documents.</H2>

<H2>
<LI> <A HREF="ftp://net.bio.net/pub/BIOSCI/BIO-INFO">
Archive of Monthly Postings.</A>
This archive (currently) contains postings
from each month as a single document.</H2>

<H2>
<LI> <A HREF="ftp://ftp.bio.indiana.edu/usenet/bionet/info-theory/">
Gopher link to Archive of Postings at IUBO.</A>
This archive contains individual postings.
Older postings are collected by the month as a single document.
There is an index for each month.
</H2>

</UL>

***********************************************************

* How Do I obtain bionet.info-theory BY EMAIL?

If you have access to USENET news YOU DO NOT NEED AN E-MAIL SUBSCRIPTION!!  We
strongly encourage all interested users to explore getting USENET news at your
site.  It's MUCH easier on you than an e-mail subscription!  Please consult
your systems manager or contact biosci-help@net.bio.net for assistance if
needed.

The BIOSCI (email) name for the forum is BIO-INFO.

Depending on where you are, you have to do different things to subscribe or be
removed from the email subscription list:

SUBSCRIBING / UNSUBSCRIBING
   North or South America or Pacific Rim:
     Using the computer account in which you want to receive mail
     messages, please send an email message to the e-mail server at
     biosci-server@net.bio.net.  Leave the Subject: line blank.
     In the body of the message include the line

     subscribe bio-info

     to add yourself to the mailing list or

     unsubscribe bio-info

     to cancel an existing subscription.  If you need personal
     subscription assistance, please contact biosci-help@net.bio.net.

   Europe, Africa, and Central Asia:
	   Send a email message to the person at
      biosci@daresbury.ac.uk
   requesting a subscription or removal from the BIO-INFO forum.

SENDING OUT POSTINGS
Thereafter, address email messages for this forum to one of:

   North or South America or Pacific Rim:
      bio-info@net.bio.net

   Europe, Africa, and Central Asia:
      bio-info@daresbury.ac.uk

You can post to either of the above address if you want.  We only request that
you sign up at your local node in order to optimize the use of the network
resources for message distribution.

Do not send subscription requests to any of these addresses, or you will have
sent it to everybody on the planet (to your great embarrassment, and we will
drub you with food cake)!  Let me say that again:  please do not post requests
for subscription or being removed from the list to the list itself, that takes
up bandwidth all over the world!

If you have problems, contact the subscription site manager who you signed up
with.  If your problem is not resolved, please contact
biosci-help@net.bio.net.

DO NOT CONTACT TOM SCHNEIDER FOR SUBSCRIPTIONS OR UNSUBSCRIBING!

***********************************************************

* Where Did I Get This FAQ File From Originally?

The latest version of this FAQ is stored in the anonymous ftp archive
ftp.ncifcrf.gov in pub/delila under the name bionet.info-theory.faq and also as
bionet.info-theory.faq.Z (The .Z means it is compressed; remember to use the
binary transfer mode if you pick up the latter.  See the uncompressed README
file in the archive for where to get the uncompress program if you need it.)
Please send questions and comments to: Tom Schneider toms@ncifcrf.gov

This file is also posted monthly on news.answers and bionet.info-theory.

***********************************************************

* What is the IP number of the FAQ archive?

For ftp.ncifcrf.gov you can use 129.43.1.11

***********************************************************
* Where Are the Bionet Archives?

The entire collection of BIOSCI/bionet messages from inception are
available via the biosci.src WAIS source at net.bio.net.  Contact
biosci-help@net.bio.net for further help with accessing this WAIS source.

FTP archives of all the BIOSCI/bionet messages are available at net.bio.net
[134.172.2.69] in /pub/BIOSCI.  bionet.info-theory is in pub/BIOSCI/BIO-INFO.
Files are in mailbox format, with names of the form YYMM (YY=last 2 digits of
the year, MM=cardinal number of the month, zero padded).  The current months
postings are in the file 'current'.  Contact biosci-help@net.bio.net for
further help with or comments on the archives.

All the bionet.* newsgroups, including info-theory, are also archived for
Gopher retrieval from the IUBIO Gopher hole and for anonymous ftp from
ftp.bio.indiana.edu, directory usenet/bionet/...

The archives can be accessed by gopher or ftp running under the Mosaic
interface.  The URL (Universal Record Locator) for gopher is:
   gopher://net.bio.net/11/BIO-INFO
This gives individual postings.  By ftp one can use:
   ftp://net.bio.net/pub/BIOSCI/BIO-INFO
Unfortunately this gives the entire month in a single document.

***********************************************************

* What Can I Do About Inappropriate Postings?

The short form of this news group's name, bio-info, can be a little confusing
to some people inexperienced in network communications or with little knowledge
of the discipline (if there is any :-) of biological information theory.  It
can and has been mistaken as a news group for general biological information.
Our readers should be aware that when such postings come to our attention, we
do attempt to inform, privately, the people who make these inappropriate
postings of the error of their ways and suggest alternative or more appropriate
venues.

Subjecting the writers of inappropriate posting to public excoriation is not a
good policy because the mistake is usually inadvertent and the follow-up
postings add further to the irritation of our regular readers.  When others
publicly reply to such posts in this news group, although they may think they
are being polite to the original poster, they are still annoying our regular
readers.  We suggest that a better policy for readers who do wish to reply to
inappropriate posts is to do so privately or to an appropriate news group.

***********************************************************

* What is the official word on copyright of this FAQ?

This FAQ fits the description in the U. S. Copyright Act of a "United States
Government work".  It was written as a part of my official duties as Government
employee.  This means it cannot be copyrighted.  The article is freely
available without a copyright notice, and there are no restrictions on its use,
now or subsequently.  I retain no rights in the FAQ.

Thomas D. Schneider

***********************************************************

* Who Takes Care of This Group?

John S. Garavelli
Protein Information Resource
National Biomedical Research Foundation
Washington, DC  20007
garavelli@gunbrf.bitnet

Tom Schneider
National Cancer Institute
Laboratory of Mathematical Biology
Frederick, Maryland  21702-1201
toms@ncifcrf.gov

John L. Spouge
National Center for Biotechnology Information
National Library of Medicine
Bethesda, MD  20894
spouge@frodo.nlm.nih.gov

Please email comments and suggestions on this faq sheet to Tom.

John Garavelli (who also answers to "Steve" if you want to avoid confusion)
often organizes dinner speakers.

John Spouge often arranges dinner locations.

***********************************************************

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!news.sprintlink.net!EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: Estimation of Entropies from a Finite Amount of Data
Date: 21 Feb 1995 03:02:37 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 43
Message-ID: <3ibl4d$s8a@hamilton.maths.tcd.ie>
References: <3i24uq$53f@hamilton.maths.tcd.ie> <3i2ubm$6ru@news.u.washington.edu> <3i5eov$c0s@hamilton.maths.tcd.ie> <D49oF1.4HD@watdragon.uwaterloo.ca> <3i90pt$h31@hamilton.maths.tcd.ie> <D4BH8L.5or@watdragon.uwaterloo.ca>
NNTP-Posting-Host: hamilton.maths.tcd.ie
Keywords: Chaitin, Shannon, algorithmic complexity, law of iterated logarithm

tromp@daisy.uwaterloo.ca (John Tromp) writes:

>That stilll doesn't solve the problem, since for any machine U, there
>are infinitely many machines V_0, V_1, ... such that V_i is almost
>functionally identical to U, and none of the V_i is functionally
>identical to another V_j.

>More precisely, V_i halts only on inputs of the form
>x00 or x1. It simulate U on x, suppose U halts with output y.
>If y <> i, it will output y and halt. If y=i though, it will only
>output y and halt if the input was x00. Thus all strings except i will incur
>a penalty of 1 relative to U, and the string i will incur a penalty of 2.

>This is enough to make the above sum, or anything like it, diverge.

I didn't understand your example.

However, with the distance between universal machines that I suggested,
namely

	d(U,V) =  |u| + |v|

where u is a prefix for U to emulate V,
and v is a prefix for V to emulate U, ie

	U(up) = V(p), V(vp) = U(p),

then there are at most 2^n machines within distance n of a given machine,
since a machine is certainly completely defined by giving its U-prefix.
Hence

	\sum_V 2^{-2d(U,V)}

must converge.
So there certainly _are_ sums of this kind which converge.



-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Mon Feb 20 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: n-Grams
Message-ID: <D46Cy4.6qC@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
Date: Sat, 18 Feb 1995 02:44:28 GMT
Lines: 25

The latest Science has an interesting article:

@article{Damashek1995,
author = "M. Damashek",
title = "Gauging Similarity with n-Grams:
Language-Independent Categorization of Text",
journal = "Science",
volume = "267",
pages = "843-848",
year = "1995"} 

references 5 and 6 are to Shannon's work on n-gram information content, and so
are relvant to our recent conversation.  The method described in this paper by
Damashek is very clean and so the paper is worth reading.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Tue Feb 21 22:00:00 1995
Path: biosci!agate!howland.reston.ans.net!pipex!sunsite.doc.ic.ac.uk!alder.cc.kcl.ac.uk!udah289
From: udah289@bay.cc.kcl.ac.uk
Newsgroups: bionet.info-theory
Subject: "Complexity" and Development
Message-ID: <1995Feb22.133521.6482@alder.cc.kcl.ac.uk>
Date: 22 Feb 95 13:35:21 GMT
Organization: King's College London
Lines: 29

Dear All
	I hope the folowing question isn't too nieve, as I've only just
discovered this newsgroup.I'm highly interested in modelling developmental
processes, and I've been trying to think about the following...

Suppose one had a simple genome of length n1 that we use in a developmental
model that starts out as a single cell and develops into some simple organism.If
we think of the developmental process whereby each cell divides etc as some
sort of Turing Machine, then one might want to say that some programs (ie
genomes) halt, some don't halt, and some eventually halt. But given that our
 genome is of length n1 we might expect that since there is only a certain 
amount of information contained within it. 

Thus however long the developmental process runs for ther is only a certain 
level of "complexity" that the completed organism can have. So with a longer 
genome of length n2>n1, then one might expect that a greater level of 
"complexity" is possible.

Is there some sort of measure of the "complexity" of an organism (or a
simulated organism more likely) that is along these lines?

Any suggestions would be appreciated

Steve Hill
(s.hill@bay.cc.kcl.ac.uk)





From owner-info-theory@net.bio.net Tue Feb 21 22:00:00 1995
Path: biosci!daresbury!bioftp.unibas.ch!citi2.fr!jussieu.fr!oleane!pipex!swrinde!sgiblab!rpal.rockwell.com!news.Stanford.EDU!usenet
From: ardell@dobzhansky.Stanford.EDU (David Ardell)
Newsgroups: bionet.info-theory
Subject: A cross-over from comp.ai.alife
Date: 22 Feb 1995 10:10:57 GMT
Organization: Stanford University
Lines: 188
Message-ID: <3if2jh$dor@nntp.Stanford.EDU>
NNTP-Posting-Host: dobzhansky.stanford.edu

I saw Dr. Amenyo's post and thought my reply to this other newsgroup might  
provide some contrast and comparison. There are some concepts here which  
are applicable to the ideas in this group, and let me just say that I  
aspire to being able to formalize these consistently, precisely, and  
accurately. Comments and criticisms are welcome. 

Of particular interest in this article which I would like to see critiqued  
is the information-oriented definition of life I provide in the first  
paragraph. There is also a reliance on Kolmogorov complexity which for me  
raises the following types of questions. Please forgive me if these are  
old hat to you. 

"What is the information content of the Mandelbrot set?" Clearly the  
algorithmic complexity required to generate the Mandelbrot set is finite.  
I believe that the information needed to describe the Mandelbrot set in  
its entirety is unbounded, as opposed to a simpler fractal like the  
Middle-Thirds Cantor Set.  When there is fractal structure to a universe,  
yet imperfect self-similarity, is there some bound on its information  
content? Does this depend on the discrete or continuous properties of the  
underlying set? Supposing an infinite information content of a given set,  
how does increasing the dimension of that set increase its  
information-content? I hope there is more to say about these questions  
than "it depends what you mean by information." What I would like to work  
towards is a characterization of information as a physical quantity.

While I have availed myself of some of the references made available by  
this group, there cannot be a substitute for discussion. Please feel free  
to mail me directly. Thank you for your time.

I am a first-year graduate student in Theoretical Biology at Stanford  
working under Marc Feldman.
___________________________________________

>Achim Clausing (cl@math.uni-muenster.de) wrote:  
>Hallo Shane,   But
>I suspect that this approach towards an understanding of "What is
>life" misses the quintessence of real life in a profound way:
>Artificial life is unavoidably embedded into an outer world, the
>real one, whereas real life  is not embedded. A computer program
>has no means to make inquiries about itself - this self is something
>not existing in the world of information but in the outer, real,
>world of the computer hardware (a real thing). Without this environment
>the program is unable to exist.  So whatever is created in software
>will lack one feature I consider as necessary for life: Life is
>"context-free". It does not need a superstructure in order to happen.
>It has the chance to understand itself. A program never has this
>chance, it is embedded into something to which it has no connection
>whatsoever.  Alife programs may resemble life, even closely in some
>aspects. But they are very dead unless someone from the outside
>(the superstructure - we) is looking at them and interpreting them.
>They need a separate context. We don't - we are our own context.
>So my these is: True life is context-free, hence it cannot be created
>since otherwise it would have the context of it's creator. (I just
>proved that God does not exist ;-) And thus building second-order,
>third-order, whatever higher order structures on the computer
>obviously is of no help. Life is first order or it isn't life.
>
>Achim Clausing 
>Universitaet Muenster Fachbereich Mathematik
>email: cl@math.uni-muenster.de
>Einsteinstr. 62                   phone: +251 83 3779  D-48149
>Muenster                  fax  : +251 83 8370 

I would like to see a proof of this point that real life is not embedded.  
In fact, the qualities you are using to make your point are ill-defined,  
so I am not sure exactly what you are saying. "Life" as best as I can  
define it is a class of dynamical behaviors in a universe 1) far from  
equilibrium and 2) organizationally superfluous, in the sense that there  
is arbitrary (unbounded?) complexity in the domain structure of that  
universe, which must consist of at least three elements. This second  
property distinguishes life as a physical process from other classical  
dynamics studied in physics: principles such as the minimization of action  
permeate our existing understanding of reality. We are used to nature  
being lazy -- and the question is why does she go to all the excruciating  
trouble of being alive? 

I will briefly try to explain the "three elements" part of my definition  
and move on to critique your argument. One such three-element universe  
would be the sequence space generated by <0,1>, or the language {0,1}*.  
Imagine this as a one-dimensional universe (a point-universe) coming in  
and out of existence along the axis of time. By definition this point has  
no structure along any axis except that of time, and yet arbirarily  
complex computation can be expressed by that system, by a dynamic which is  
heretofore unexpressed. The question of what causes that dynamic to  
impinge on our point in question I will relegate as isomorphic to the  
question of why does our universe obey particular physical laws. What is  
the nature of causality? That's a damn good question. Anyway, the point  
here is that our point, being whisked in and out of existence according to  
this dynamic, has no ability to detect its dynamic. In fact, if we were to  
make our point self-aware, it would conceive of itself as always in  
existence. The reason is that there is no room within its zero-dimensional  
existence to store information about its one dimensional time dynamic. No  
room for memory, no watches, no notepads. When it is not in existence,  
it's not around to notice it's gone! 

If you are interested in the threeness part of the above argument so far,  
I refer you to the philosophy of Charles Sanders Peirce, American chemist  
and founder of pragmatism and (regarded by some as founder of) semiotics.

In order to critique your point, I need to make a clarification. Although  
I characterized life as a "dynamic", and my example above used time as one  
of the three elements in my universe (the other being 0 and 1, or  
existence and nothingness), technically I don't require time as an element  
in a living universe. Just as probability (through the ergodic principle)  
can be regarded either as a proportion of event frequencies or as a way of  
stating the likelihood of a single event, just as the computational  
complexity of an algorithm balances time and computer memory, the complex  
dynamic of life can exist either through time or space. A timeless  
universe can be alive according to this definition! What would it mean for  
a timeless universe to be far from equilibrium? One possibility is that it  
would require an arbitrarily long description to specify the spatial  
structure of that universe.

Anyway, I'm willing (for now) to restrict my definition of life to  
universes with time, and characterize it as a dynamic. What I wish to  
extend from this is that all life is embedded. Life in our universe is  
embedded in the fabric of space-time. And so is life on a computer. The  
living medium is either matter or information. In fact, both in a computer  
and in our bodies, matter, energy and information can all independently  
and together be regarded as alive by my definition. And they are all three  
components to both universes. You may see how they are all components of  
your own body, and they are all also components of an alife simulation,  
embedded as it is in computer processors which operate by means of the  
solid state (not all alife is on a computer of course, and it is fun to  
ask yourself how a mathematical formalism can be alive by my definition).

I haven't here made any proof of the following ideas, but I hope you can  
take the ideas presented so far and sketch a justification of the  
following claims for yourself. 1) In much of the universe we are currently  
paying attention to, all life is embedded in space and time. Life is  
further embedded in the rules that govern the nature of causality within a  
living universe. Therefore it is not fair to say that alife is not "real  
life" simply by the contrast you draw in embeddedness, because our life is  
embedded. 2) We cannot know if there are embeddings of our own universe  
orthogonal to our self-circumscribed reality. In this sense we are no  
better than the organisms in our computers. This argument is akin to the  
skepticism of Bishop Berkeley, which has to my knowledge never been fully  
resolved. 3) Life is a property of universes, which are circumscribed by  
an act of interpretation. Therefore by my definition, our universe is  
alive. A universe which contains a living universe is alive. 4) Because of  
this, an embedded universe can in fact be alive. In fact, an artificial  
universe such as a cellular automaton can encode hypotheses about its own  
existence even though some may in fact be untestable (that is in fact what  
I'm doing right now). 5) There may in fact be recursive embedding of  
living universes, and it is this which contributes to the unbounded  
complexity of living universes.  

So now I will try to directly address your other claims on your own terms.  
All embedded living universes are in fact not context-free, and in fact  
our own life is not context-free because it is embedded in space, time,  
and rules of causality.  In circumscribing our universe we may borrow from  
statistics the ideas of "parametric" and "sample" entities. Here we have  
the class of all observables informing our sample universe, and by  
definition any non-observables which may exist form one non-observable way  
in which our sample universe and any parametric universe may diverge. What  
is observable is the existence of uncertainty -- in everyday life. We are  
in fact not omniscient, so the pragmatic trick of discounting  
non-observables epistemologically is in my mind not wholly justified. 

By saying "we are our own context" you are tautologically right. Yet by  
the above and in other important ways, we are not our only context. It is  
hubris to think that we are more than the universe that spawned us. Life  
and consciousness are heretofore ill-defined but pragmatic concepts we  
have invented to explain what we perceive as the difference between humans  
and the rest of the universe, and yet we communicate these concepts to  
each other in a way that cows and planets could not understand whether  
they possessed the capacity to or not (by this I am only suggesting that  
our media of communication are tuned to our sensory abilities and as such  
one could imagine epistemological obstacles on the part of other disjoint  
living entities to perceive _us_ as alive or conscious). 

So in a very real and final way, the "we" you refer to certainly is its  
own context. We can continue to believe in consciousness and other  
ego-candy to feed our need to distinguish ourselves from the rest of the  
universe (cf the doctrine of geocentricism), or perhaps we may one day  
find pragmatic, parsimonius, aesthetic, philosophical, moral and  
scientific value in abandoning these ill-conceived notions in favor of  
ones that embed us within our supercedent universe -- itself truly alive.

Well, if you've made it to the bottom of this article, thanks for reading  
it. Good day.

---
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*
David ARDELL	<<Nothing is "mere">> -- Richard P. Feynman
Department of Biological Sciences, Stanford University
Stanford, CA 94305-5020	 Phone: 415-723-4952   Fax: 415-725-8244
NeXTMail welcomed.

From owner-info-theory@net.bio.net Tue Feb 21 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!vixen.cso.uiuc.edu!uwm.edu!reuter.cse.ogi.edu!netnews.nwnet.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: "Complexity" and Development
Date: 22 Feb 1995 22:16:05 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 70
Distribution: world
Message-ID: <3igd35$q5c@news.u.washington.edu>
References: <1995Feb22.133521.6482@alder.cc.kcl.ac.uk>
NNTP-Posting-Host: ionesco.math.washington.edu

In article <1995Feb22.133521.6482@alder.cc.kcl.ac.uk>,
udah289@bay.cc.kcl.ac.uk writes:

|> Suppose one had a simple genome of length n1 that we use in a developmental
|> model that starts out as a single cell and develops into some simple organism.If
|> we think of the developmental process whereby each cell divides etc as some
|> sort of Turing Machine, then one might want to say that some programs (ie
|> genomes) halt, some don't halt, and some eventually halt.

Could you elaborate on this?  I don't understand something here.
What biological event corresponds to the halting of the Turing machine?
Are you thinking of repressing certain genes and thus halting their further
effect on development?

|>But given that our
|>  genome is of length n1 we might expect that since there is only a certain 
|> amount of information contained within it. 

Yes, but it does not follow that all the information needed to describe the
cell (or organism) resides within the genome.  Some of this ``information''
is presumably encoded not within the genome but, in some complicated fashion,
in the physical structures present along with the genome when the germ cell
first divides.  In short, ``naked DNA'' cannot develop into even a simple
organism; you need some minimal ``infrastructure'' to start decoding the DNA
and transforming the instructions into actions which ``construct'' your organism.

|> Thus however long the developmental process runs for ther is only a certain 
|> level of "complexity" that the completed organism can have.

It depends upon what you mean by ``complexity''.  I have a different notion
of biocomplexity (not yet made into a formal definition) according to which
the biocomplexity increases superadditively as the developing organism eats
and grows.  Of course, both our notions may be equally valid; I suspect that
eventually there will be a number of distinct notions of biocomplexity, each
useful in a different set of biological applications.  It seems to me that
if you are trying to describe development, a measure of complexity that
increases as development proceeds might capture our intuition that a
twenty year old human is considerably more complex than a single oocyte.

|> Is there some sort of measure of the "complexity" of an organism (or a
|> simulated organism more likely) that is along these lines?

I have had some thoughts along these lines (stimulated by the current work
on polynomial invariants of knots, links, and graphs) which I will attempt
to describe in a later post (time permitting).  To keep things simple, I
advocate first trying to define a notion of the complexity of a single cell
or even a biomolecular ``soup'', but I expect that if this can be done and
turns out to have interesting and useful applications to cellular biology,
a similar approach might work for higher levels of structure, up to and including
ecological biocomplexity.  For now let me just say that I suspect that the
notion of biocomplexity we seek is fundamentally different from what I have
called in earlier posts a Shannonian information theory (and from algorithmic
information theory).  Incidently, I do feel that the question
``how much information is needed to describe an elephant, provided
that you are willing to do alot of biomolecular work over many years to grow
your elephant from a single cell?'' is related to the question ``how much
information is needed to describe the Mandlebrot set, if you are willing
to do a large amount of computation to turn a coded description into an
actual copy of this set?''--- but in my view the first question
is answered by the notion of biocomplexity I alluded to above, but by
a distinct notion of biocomplexity vaguely related to (but certainly
distinct from) algorithmic complexity.

Incidently, are you aware of Alan Turing's early work on development?
He wrote an influential paper arguing that if you have several ``inducers''
diffusing at different rates, you can produce striped or polka dotted patterns,
etc.   I think this idea has recently been used in various computer models
of ``how the leopard got it's spots''.

Chris Hillman

From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: Enthalpy and Entropy
Message-ID: <D4GpCx.K1t@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <D49u80.GL7@ncifcrf.gov>
Date: Thu, 23 Feb 1995 16:48:33 GMT
Lines: 78

John Jungck gave me permission to post the following email.  These are
references that link enthalpy to the increase of entropy in the surroundings.
Thanks John!

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms
soon: 
  http://www-lmmb.ncifcrf.gov/~toms
works now: 
  http://www-lmmb.ncifcrf.gov/

| jungck@beloit.edu wrote:
| >From jungck@beloit.edu  Thu Feb  9 20:55:30 1995
| Date: Thu, 9 Feb 1995 19:56:48 -0600
| Message-Id: <9502100156.AA21905@beloit.edu>
| To: ram
| From: jungck@beloit.edu
| Subject: Re: Enthalpy and Entropy
| 
| Dear Ram,
| 
| Here are a few to get you started:
| 
| Norman C. Craig, "Our Freshman Like the Second Law." J. Chem. Ed. 47(5):
| 342 - 346 (May 1970).
| 
| "In this treatment the enthalpy function is then seen to be a natural
| consequence of shifting one's point of view from a global one to a
| reactive-system one."  
| 
| Norman C. Craig, "Entropy Analyses of Four Familiar Processes." J. Chem.
| Ed. 65: 760 - 764 (May 1970).
| 
| 
| Shimshon Novick, "No Energy Storage in Chemical Bonds." J. Biological
| Education 10 (3): 116 - 118 (1976).
| 
| Henry Bent, "The Second Law - How Much, How Soon, to How Many?" J. Chem.
| Ed. 47(5): 337 - 341 (May 1970).
| 
| Henry Bent, "Haste Makes Waste: Pollution and Entropy." Chemistry 44: 6 -
| 15 (October 1971)
| 
| Also of possible interest to you:
| 
| P. W. Hochachka et al., "Enthalpy-entropy Compensation of Oxamate Binding
| by Homologous Lactate Dehydrogenases." Nature 260: 648-649 (April 15,
| 1976).
| 
| me: "Thermodynamics of Self Assembly: An Empirical Example Relating Entropy
| and Evolution." pp. 101 - 109 In Duane L. Rohlfing and A. I. Oparin,
| editors, Molecular Evolution: Prebiological and Biological, Plenum Press:
| New York, (1972).
| 
| Sorry to be so out of date.  Just haven't worked in this arena in a long
| time.
| 
| I'd be curious in feedback after you read them (especially Craig and Bent).
| 
| Take care,
| 
| John
| 
| *************************************************
| John R. Jungck                                  *
| Department of Biology                           *
| Beloit College                                  *
| 700 College Street                              *
| Beloit, WI 53511                                *
| (608)-363-2267 office                           *
| (608)-363-2226 messages                         *
| (608)-363-2052 FAX                              *
| e-mail: jungck@beloit.edu                       *
| *************************************************

From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!emerald.oz.net!news.worldcom.com!news.sesqui.net!rice!oekelly
From: oekelly@rice.edu (Owen Ernest Kelly)
Newsgroups: bionet.info-theory
Subject: Re: telomeres
Date: 22 Feb 1995 16:33:13 GMT
Organization: Rice University
Lines: 153
Distribution: world
Message-ID: <3ifp09$k5g@larry.rice.edu>
References: <3i5rfo$crj@mserv1.dl.ac.uk> <Pine.SUN.3.91.950220090759.2638C-100000@snfma1.if.usp.br>
Reply-To: oekelly@rice.edu (Owen Ernest Kelly)
NNTP-Posting-Host: riemann.rice.edu
Originator: oekelly@riemann.rice.edu


In article <Pine.SUN.3.91.950220090759.2638C-100000@snfma1.if.usp.br>, szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:
|> On 18 Feb 1995, gad yagil wrote:
|> 
|> > 
|> > Tim Murphy writes:
|> > | My question is simply whether H(s) where s is the DNA string
|> > | could make a sensible definition of the "complexity" of a creature.
|> > 
|> > Tom Schneider answers:
|> > >I believe that that is approaching the problem in a way
|> > > that is fruitless. Lots has been written about it and no measurements
|> > > and no conceptual advances  have been made!  We already know how
|> > > much DNA there is, and CoT curves of the "complexity" were done 30
|> > > years > ago.  It doesn't teach us anything new.
|> >  (If you don't believe that, > then tell > me something new!)
|> > 
|> > Here is an example which may teach us something new:
|> > 
|> > These are the first 363 bases of the complete nucleotide sequence of
|> > yeast chromosome III:  (The end regions of a chromosome are called
|> > telomeres):
|> > 
|> > ~      1  CCCACACACC ACACCCACAC CACACCCACA CACCACACAC ACCACACCCA
|> >       51  CACACCCACA CCACACCACA CCCACACCAC ACCCACACAC CCACACCCAC
|> >      101  ACACCACACC CACACACACC ACACCCACAC ACACCCACAC CCACACACCA
|> >      151  CACCCACACA CACACCACAC CCACACACAC CACACCACAC CCACACCACA
|> >      201  CCCACACCCA CACACCACAC CCACACCCAC ACCCCACACC CACACACCAC
|> >      251  ACCCACACAC ACCACACCCA CACACACCCA CACCACACCC ACACACCACA
|> >      301  CCCACACACC CACACCCACA CACACCACAC CCACACCACA CCCACACCCA
|> >      351  CACACCCACA CCC (TAACAC....
|> > 
|> > This terlomeric region, like other telomeric regions sofar, is
|> > dominated by a single short motif: CCACAC. It is clearly less complex
|> > then most of genomic DNA. Here is the shortest program I managed to
|> > formulate today for the sequence:
 
            [ ... snip ... ]

|> > This string is probably not the
|> > shortest description of the sequence. The length of the algorithm is 83 steps -
|> >  63 toprint the string and 20 more algorithmic steps are involved in
|> > the five definitions listed. This yields a relative complexity of
|> > 83/363 = 0.228. I challenge the mathematicians
|> > on this net to find and proove (if proofable) a "super"algorithm to
|> > determine the real shortest program.
|> > 

The super-algorithm is the Lempel-Ziv compression algorithm. The Lempel-Ziv 
(self-)parsing of the given string requires 63 words and is given at the end
of this message. The first part of it looks like this 

                                    evaluates to      reconstructes to
     0                 0      		""     		""
     1     0     C     1     1		"C"		"C"
     2     1     C     2     3		"CC"		"CCC"
     3     0     A     1     4		"A"		"CCCA"
     4     1     A     2     6		"CA"		"CCCACA"
     5     4     C     3     9		"CAC"		"CCCACACAC"

with the following interpretation.
The actual representation that gets transmitted is the 2nd and 3rd columns
read left to right top to bottom 0C 1C 0A 1A 4C... . 

To reconstruct the sequence, we have to build the table as we go 
(first three columns only). The left hand column is the word number.
The empty string is the zeroeth entry of the table.
The first word "0C" specifies to look up the "0" entry of the table,
then concatenate "C" to it. Thus "0C" evaluates to "C" and this is appended to 
the (currently empty) recontruction (last column). The 4th
column shows the length of the evaluted string length("C")=1 and
the 5th col. shows the running total of how much we have coded so far.
The next word is 1C, so we look up the first table entry "C" and concatenate "C"
to it to get "CC", this is added to the reconstruction and so on.
Whether this is actually shorter than the program proposed by
Tom Schneider depends on bit counting arguments (am I allowed to 
use 63 distinct words where he has used only an handful?).
I haven't gone through the motions on that issue.
Asymptotically though, the LZ algorithm does very well.
Whether you can gain any insight from the tree that the 
LZ algorithm constructs is another story altogether!

Regards,

Owen Kelly


     0                 0 (the empty string)  
     1     0     C     1     1
     2     1     C     2     3
     3     0     A     1     4
     4     1     A     2     6
     5     4     C     3     9
     6     5     A     4    13
     7     2     C     3    16
     8     3     C     2    18
     9     8     C     3    21
    10     8     A     3    24
    11     7     A     4    28
    12     6     C     5    33
    13    12     A     6    39
    14     5     C     4    43
    15    10     C     4    47
    16     2     A     3    50
    17    12     C     6    56
    18    17     A     7    63
    19    14     A     5    68
    20    14     C     5    73
    21    15     C     5    78
    22    21     C     6    84
    23    15     A     5    89
    24    11     C     5    94
    25     9     C     4    98
    26    23     C     6   104
    27    17     C     7   111
    28    26     A     7   118
    29    16     C     4   122
    30    25     A     5   127
    31    13     C     7   134
    32    29     A     5   139
    33    24     A     6   145
    34    19     C     6   151
    35    30     C     6   157
    36    28     C     8   165
    37    27     A     8   173
    38    31     C     8   181
    39    21     A     6   187
    40    20     A     6   193
    41    34     A     7   200
    42    33     C     7   207
    43    32     C     6   213
    44     9     A     4   217
    45    40     C     7   224
    46    35     A     7   231
    47     7     C     4   235
    48    22     A     7   242
    49    18     C     8   250
    50    46     C     8   258
    51    39     C     7   265
    52    50     A     9   274
    53    45     A     8   282
    54    43     C     7   289
    55    38     A     9   298
    56    53     C     9   307
    57    50     C     9   316
    58    31     A     8   324
    59    54     C     8   332
    60    51     A     8   340
    61    42     C     8   348
    62    38     C     9   357
    63    21     C     6   363
-- 
 

From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Path: biosci!SNFMA1.IF.USP.BR!szeinfel
From: szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S)
Newsgroups: bionet.info-theory
Subject: Re: Self- Organization and Information Theory
Date: 23 Feb 1995 08:46:19 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 42
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <Pine.SUN.3.91.950223133843.4555D-100000@snfma1.if.usp.br>
References: <3iiae9$7fr@newsbf02.news.aol.com>
NNTP-Posting-Host: net.bio.net

On 23 Feb 1995, HPYockey wrote:

> Subject: Self-organization; Information Theory
> kmatsuno@VOSCC.NAGAOKAUT.AC.JP (Koichiro Matsuno/7129)
> writes 18 Feb 1995  Message ID:<9502181308.AA00134@voscc.nagaokaut.ac.jp>
> com
> 
	All deleted up to here.

> 
> After my 1981 paper, professor Sidney Fox and K. Matsuno suffered in
> silence until they sent a Letter to the Editor 101 321-323 (1983). It took
> some time but in this letter they had at last discovered "some of the
> unidentified assumptions in Yockey's (1981) interpretation of information
> theory relative to concepts of self-organization in the origin of life".
> They continued with the sentence quoted by Matsuno and commented further:
> "The essential fallacy of this back-extrapolation for proteins is the
> implied dependence upon a one-to-one correspondence in amino acid residues
> from one stage to another, especially with respect to information
> content." 

	While reading this paragraph, it came to my mind a recent paper 
on the evolution of the genetic code using group theory, it apeared in 
physical review letters 1993, december or november issue. 

title:	Algebraic approach to the evolution of the genetic code.
author: J.E. Hornos and I. Hornos

	
if someone is interested and cannot find the article I will try to find 
it.

>all the rest deleted too.
					Rafael.
*---------------------------------------------------------------------*
* Rafael Iosef Najmanovich Szeinfeld | Depto. de Fisica Geral         *
* Statistical Mechanics Group        | Instituto de Fisica            *
* General Physics Deptartament       | Universidade Sao Paulo         *
* Physics Institute                  | Rua do Matao s/n CEP 01452-990 *     
* University of Sao Paulo - Brazil   | Caixa Postal 20516             * 
* E-mail: szeinfel@snfma1.if.usp.br  | Sao Paulo - Brasil             *
*---------------------------------------------------------------------*

From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Path: biosci!CS.SANDIA.GOV!scistra
From: scistra@CS.SANDIA.GOV (Sorin C. Istrail)
Newsgroups: bionet.info-theory
Subject: Call For Papers
Date: 23 Feb 1995 14:06:25 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 78
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9502232205.AA13385@frodo.cs.sandia.gov.noname>
NNTP-Posting-Host: net.bio.net



           DISCRETE APPLIED MATHEMATICS

           Announcing a Special Issue on
          Computational Molecular Biology

Guest Editors: Sorin Istrail, Pavel Pevzner, Ron Shamir

Submission Deadline: May 31, 1995

Manuscripts are solicited for a special issue of 
DISCRETE APPLIED MATHEMATICS on topics concerning 
the development of new combinatorial and algorithmic 
techniques in computational molecular biology. 
With this announcement "Discrete Applied Mathematics"  
starts a series of special issues devoted to 
computational biology. This series will publish papers 
on the mathematical and algorithmic foundations of 
the inherently discrete aspects of computational biology.

The traditional partnership of mathematics and physics
has advanced and enriched both disciplines. In a similar 
partnership, mathematics and algorithms are becoming 
crucial tools in the rapid advancement of molecular biology. 
At the same time, the computational challenges of these 
biological disciplines raise exciting new problems in 
discrete mathematics and theoretical computer science. 
To paraphrase Stan Ulam, those challenges reflect 
"not only what mathematics can do for biology but what 
biology can do for mathematics."

The following is a (nonexhaustive) list of possible 
topics of interest for the special issue:

        - DNA mapping

        - DNA sequencing

        - DNA/protein sequence comparison

        - molecular evolution

        - RNA/protein structure 

        - gene/motif recognition 

Seven (7) copies of complete manuscripts should be sent to 
any of the Guest Editors indicated below by May 31, 1995. 
We expect the Special Issue to appear in the Fall 1996. 
Manuscripts must be prepared according to the normal 
submission requirements of Discrete Applied Mathematics, 
as described in each issue of the journal.  The Guest Editors
will make every possible effort to assure the timely completion
of a thorough refereeing process.

The Guest Editors are:

  Sorin Istrail 
  Sandia National Laboratories
  Algorithms and Discrete Mathematics Department
  MS 1110
  Albuquerque, NM 87185-1110
  scistra@cs.sandia.gov

  Pavel Pevzner
  Dept. of Computer Science and Engineering
  The Pennsylvania State University
  University Park, PA 16802 
  pevzner@cse.psu.edu  

  Ron Shamir 
  Dept. of Computer Science 
  Tel Aviv University 
  Tel Aviv 69978 
  ISRAEL  
  shamir@math.tau.ac.il


From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: telomeres
Message-ID: <D4H2u3.53o@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3i5rfo$crj@mserv1.dl.ac.uk> <Pine.SUN.3.91.950220090759.2638C-100000@snfma1.if.usp.br> <3ifp09$k5g@larry.rice.edu>
Date: Thu, 23 Feb 1995 21:39:39 GMT
Lines: 34

In article <3ifp09$k5g@larry.rice.edu> oekelly@rice.edu (Owen Ernest Kelly) writes:

>The super-algorithm is the Lempel-Ziv compression algorithm.

Thanks for the description of Lempel-Ziv.  There is another nice description
in:

@book{Lucky1989,
author = "R. W. Lucky",
title = "Silicon dreams: information, man, and machine",
publisher = "St. Martin's Press",
address = "New York",
isbn = "0-312-02960-8",
year = "1989"}

(Lucky is president [I think] of AT&T, and writes very clearly.)

My understanding is that the Lempel-Ziv algorithm approaches the Shannon
H boundary.  This is different from the algorithmic complexity though.
What happens when you try Lempel-Ziv on the first million digits of pi?
(http://akebono.stanford.edu/yahoo/Science/Mathematics/Numbers/PI !!)
This could be compared to a program that makes pi
(http://www.ccsf.caltech.edu/~roy/pi.c).

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www.ncifcrf.gov:2001/~toms/
soon: 
  http://www-lmmb.ncifcrf.gov/~toms/
works now: 
  http://www-lmmb.ncifcrf.gov/

From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Path: biosci!adam.cc.sunysb.edu!news.nysernet.net!news.sprintlink.net!pipex!sunsite.doc.ic.ac.uk!alder.cc.kcl.ac.uk!udah289
From: udah289@bay.cc.kcl.ac.uk
Newsgroups: bionet.info-theory
Subject: Re: "Complexity" and Development
Message-ID: <1995Feb23.182805.6512@alder.cc.kcl.ac.uk>
Date: 23 Feb 95 18:28:05 GMT
References: <1995Feb22.133521.6482@alder.cc.kcl.ac.uk> <3igd35$q5c@news.u.washington.edu>
Distribution: world
Organization: King's College London
Lines: 107


> |> Suppose one had a simple genome of length n1 that we use in a developmental
> |> model that starts out as a single cell and develops into some simple organism.If
> |> we think of the developmental process whereby each cell divides etc as some
> |> sort of Turing Machine, then one might want to say that some programs (ie
> |> genomes) halt, some don't halt, and some eventually halt.
> 
> Could you elaborate on this?  I don't understand something here.
> What biological event corresponds to the halting of the Turing machine?
> Are you thinking of repressing certain genes and thus halting their further
> effect on development?

OK, I'm planning on modelling cell development by a graph generation algorithm
which starts from a single point and develops into a completed graph. The nodes
of the graph are all Turing Machines (running the same Program) -this is based
on Ray Paton's work on the Cell as a Turing Machine. The way it is to be set
up, the states of the Turing Machine correspond to the possible cell types, the
actions that the machine performs correspond to cell transformations eg
dividsion, and morphogen release. The input to each machine is the morphogen
levels on the node's edges (If this doesn't make sense, I have a properly
written description somewhere, I just havn't figured out how to paste stuff
into the newsgroup editor yet!).

I don't think that halting corresponds to any particular "event" as such.
However in normal development cell division does halt (I'm particularly
thinking of organisms like C.Elegans whose precise developmental pathways are
known). Maybe (under this model) non-halting algorithms correspond to
neoplastic growth.  (I should say that Ib am fairly new to this field, so if
anyone thinks I am labouring under any conceptual fallacies, please tell me)

 
> |>But given that our
> |>  genome is of length n1 we might expect that since there is only a certain 
> |> amount of information contained within it. 
> 
> Yes, but it does not follow that all the information needed to describe the
> cell (or organism) resides within the genome.  Some of this ``information''
> is presumably encoded not within the genome but, in some complicated fashion,
> in the physical structures present along with the genome when the germ cell
> first divides.  In short, ``naked DNA'' cannot develop into even a simple
> organism; you need some minimal ``infrastructure'' to start decoding the DNA
> and transforming the instructions into actions which ``construct'' your organism.
> 
Agreed!

> |> Thus however long the developmental process runs for ther is only a certain 
> |> level of "complexity" that the completed organism can have.
> 
> It depends upon what you mean by ``complexity''.  I have a different notion
> of biocomplexity (not yet made into a formal definition) according to which
> the biocomplexity increases superadditively as the developing organism eats
> and grows.  Of course, both our notions may be equally valid; I suspect that
> eventually there will be a number of distinct notions of biocomplexity, each
> useful in a different set of biological applications.  It seems to me that
> if you are trying to describe development, a measure of complexity that
> increases as development proceeds might capture our intuition that a
> twenty year old human is considerably more complex than a single oocyte.
> 
"Biocomplexity" - I like the sound of that! This is defininately the sort of
intuitiion I am trying to capture.  


> |> Is there some sort of measure of the "complexity" of an organism (or a
> |> simulated organism more likely) that is along these lines?
> 
> I have had some thoughts along these lines (stimulated by the current work
> on polynomial invariants of knots, links, and graphs) which I will attempt
> to describe in a later post (time permitting).  To keep things simple, I
> advocate first trying to define a notion of the complexity of a single cell
> or even a biomolecular ``soup'', but I expect that if this can be done and
> turns out to have interesting and useful applications to cellular biology,
> a similar approach might work for higher levels of structure, up to and including
> ecological biocomplexity.  For now let me just say that I suspect that the
> notion of biocomplexity we seek is fundamentally different from what I have
> called in earlier posts a Shannonian information theory (and from algorithmic
> information theory).  Incidently, I do feel that the question
> ``how much information is needed to describe an elephant, provided
> that you are willing to do alot of biomolecular work over many years to grow
> your elephant from a single cell?'' is related to the question ``how much
> information is needed to describe the Mandlebrot set, if you are willing
> to do a large amount of computation to turn a coded description into an
> actual copy of this set?''--- but in my view the first question
> is answered by the notion of biocomplexity I alluded to above, but by
> a distinct notion of biocomplexity vaguely related to (but certainly
> distinct from) algorithmic complexity.
> 
I think your connection of Biocomplexity to the Mandlebrot set description is
right on the mark. I have been thinking on these lines myself. Possibly
Biocomplexity might be related to some sort of measure of the fractal dimension
of a cell lineage tree, or something along those lines. I would definately be
interested in reading anything you have come up with on these lines. 

> Incidently, are you aware of Alan Turing's early work on development?
> He wrote an influential paper arguing that if you have several ``inducers''
> diffusing at different rates, you can produce striped or polka dotted patterns,
> etc.   I think this idea has recently been used in various computer models
> of ``how the leopard got it's spots''.
> 

Indeed I have. I am hoping that I may be able to get my model to do this at
some point.

Thanks for your comments

Steve Hill
(s.hill@bay.cc.kcl.ac.uk)


From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Path: biosci!daresbury!not-for-mail
From: gad yagil <LCYAGIL@WEIZMANN.weizmann.ac.il>
Newsgroups: bionet.info-theory
Subject: telomere
Date: 23 Feb 1995 00:22:46 -0000
Lines: 45
Sender: lpddist@mserv1.dl.ac.uk
Distribution: bionet
Message-ID: <3igkgm$2rr@mserv1.dl.ac.uk>
X-Acknowledge-To: <LCYAGIL@WEIZMANN>
Original-To: bio-info@dl.ac.uk



Yagil presented a calculation of the algorithmic complexity of a yeast
telomere.

Raphael Iosef Najmanovich Szeinfeld comments:

; This exemple is quite interesting since you have a sequence
; obviously organized so you expect it to contain a large amount of
; information(that may be seem to be directly proportional to complexity)
; but using this definition of complexity (the ratio between the minimal
; amount of data that uniquely describes your system and the amount of
; data actually used to describe it) you finds a small complexity ratio.
; The point is that, if you can properly describe your system in a
; simple form, is it really complex or you were counting noise as
; information?

 I am certainly not counting noise, because whenever DNA is extracted
from yeast cells - be it in Sao Paolo,Rehovot or Dublin,
(except for possible strain differences) one will always observe the same
sequence at the telomere of chromosome III. This is a fundamental diffe-
rence from noise - A noisy source will emit a DIFFERENT sequence of signa
ls each second; you will never get the same sequence, not even from
same source in Sao Paolo.


   The reason for the low complexity of the 363 base sequence is the
repeated CCACAC sequence; it appears 47 times, i.e. 282 bases. Complexity
is determined by the amount of regularity (repeat of elements) in the
sequence, NOT by the total amount of order(or organization if you wish).
To put it another way:An ordered sequence is  highly complex, if it
contains NO regular elements. Noise is just random, not complex. How can
we know whether a sequence is  random (just noise) or ordered?
This can not be established  by mere inspection of a single sequence.
Rather, a sufficient number of samples has to be inspected
sampled at different times or locations. Only if found at
least partly identical, can a meaningful complexity value be assigned.
For more details see the papers mentioned last time (sent today
to six requesters).


Gad Yagil,
Dept. of Cell Biology,
The Weizmann Institute, Rehovot, Israel, 76100.
LCYAGIL@WEIZMANN.WEIZMANN.AC.IL

From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Path: biosci!rutgers!gatech!howland.reston.ans.net!cs.utexas.edu!news.sprintlink.net!uunet!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: hpyockey@aol.com (HPYockey)
Newsgroups: bionet.info-theory
Subject: Self- Organization and Information Theory
Date: 23 Feb 1995 10:43:05 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 136
Sender: root@newsbf02.news.aol.com
Message-ID: <3iiae9$7fr@newsbf02.news.aol.com>
Reply-To: hpyockey@aol.com (HPYockey)
NNTP-Posting-Host: newsbf02.mail.aol.com

Subject: Self-organization; Information Theory
kmatsuno@VOSCC.NAGAOKAUT.AC.JP (Koichiro Matsuno/7129)
writes 18 Feb 1995  Message ID:<9502181308.AA00134@voscc.nagaokaut.ac.jp>
com

How nice to hear from Professor Koichiro Matsuno who responds to my
request for comments on self-organization of Manfred Eigen and the origin
of life! He quotes a sentence from my paper in Journal of Theoretical
Biology  91 pages 13-31 (1981). 

I reread this paper and I find it one of the best I have ever written. It
began with a quote of George Gaylord Simpson's classic paper Science 143
p769 (1964). "The nonprevalance of humanoids." 

Simpson called the search for extra-errestrial life "a gamble of the most
adverse odds in history". We are now 31 years after Simpson's paper and
less is known (sic) than in 1964. Afficionados have lost their cherished
reducing atmosphere and--worst of all-the primeval soup essential to all
"origin of life" scenarios has faded away like a Cheshire cat leaving only
the smile. For justification of that remark see chapter 8 in Information
Theory and Molecular Biology and my reply to Lifson in BioEssays January
1995.

My 1981 paper showed that information theory proves the impossibility that
life originated by a self-organization process, whatever that may be. I
enlarged this argument in Chapter 10 of Information Theory and Molecular
Biology, especially section 10.3 where I discussed the essential fallacies
of Fox's protenoids.

Proteinoids are neither new nor original with Fox. I put the following
epigram at the beginning of Chapter 12. 
"I am in point of fact, a particularly haughty and exclusive person, of
pre-Adamic ancestral descent. You will understand this when I tell you
that I can trace my ancestry back to a protoplasmal primordial atomic
globule." Pooh-Bah in The Mikado Act 1 by W. S. Gilbert. Obviously,
Gilbert and his audiences through the years have thought it a huge joke to
put this in the mouth of a comic, pompous, ostentatious official.

After my 1981 paper, professor Sidney Fox and K. Matsuno suffered in
silence until they sent a Letter to the Editor 101 321-323 (1983). It took
some time but in this letter they had at last discovered "some of the
unidentified assumptions in Yockey's (1981) interpretation of information
theory relative to concepts of self-organization in the origin of life".
They continued with the sentence quoted by Matsuno and commented further:
"The essential fallacy of this back-extrapolation for proteins is the
implied dependence upon a one-to-one correspondence in amino acid residues
from one stage to another, especially with respect to information
content." 

Sidney Fox was the editor of the Proceedings of the Liberty Fund
Conference on Selforganization held in 1984 at Key Biscayne Florida. He
was careful to see that I was not invited. He does not mention my paper of
1981 in the Proceedings.

He contributed an article to the book Science and Creationism edited by
Ashley Montagu 1984 published by Oxford. In this article he comes up with
these and other bloopers: page 203; "Some recognize that proteins in
modern organisms are nonrandom for the set of molecules as a whole, this
kind of non randomness (or order) being derivative of the action of DNA
and RNA." "The instructions for protobiotic proteins have been explained
experimentally as arising from the amino acids themselves. This novel
experimental fact is crucial to the understanding of the origin of life." 
Page 204 "Since the order in the amino acids results from no materials
other than the reactants, mixed amino acids must order themselves." (I
wonder what tRNA is doing!)

Page 214 "Proteinoids are in the main much like proteins, but the very
name indicates they are not proteins." A rose by any other namexx

On page 198 Sidney Fox referred to my 1981 paper and attempted to imply
that I am a creationist: "Many authors, such as Darnbough et al (1981 a,
b) and Yockey (1981), feel they can reconcile scripture and science;
mostly these are not creationists in the sense of the ICR fundamentalists.
Following on page 199 he says: "The self-organizationist theme is accepted
by numerous scientists, e. g. Eigen (1971), Kuhn (1981), Matsuno (1981),
although not by others such as Yockey (1981), whose numerous quotations
from scripture in his critique of self-organization raises a question of
the purity of his scientific premises." 

My quotations from scripture and other sources, including Homer and
Jonathan Swift's Gulliver's Travels, available to educated men, are
literary allusions used by all competent authors that illuminate the point
under discussion. Any well-educated and well-read reader would understand
that. Fox's objection is strictly de minimus. 

Haldane who, although a communist and an atheist, was an educated man and
a very competent writer, noted in a paper published in 1954 that the idea
that the oceans brought forth life is an old one. Haldane quoted Genesis
1:10 "And God called the dry land Earth; and the gathering of the waters
He called Seas: and God saw that it was good." Genesis 1:20: "And God
said, Let the waters bring forth abundantly the moving creatures that hath
life, and fowl that may fly above the Earth in the open firmament of
Heaven." 

In Information Theory and Molecular Biology I made it quite clear that the
role of religion is different from that of science. 

Matsuno continues in his posting: "Yockey's emphasis on the
Shannon-McMillan theorem is sound and unquestionable."  "Its corollary,
however, is that information thus conceived is synchronic in the sense
that the source matrix of information or how to partition the probability
space, one fixed, remains invariant in time." 
This is the old confusion of information and meaning that I discussed in
the book and in my paper in January BioEssays.

In the first place, the Shannon-McMillan Theorem tells us that if the
probability distribution is not all equal we may divide the total number
of the sequences in the ensembles into two groups. The number of sequences
on one group, called the high probability group, is 2^NH where N is the
number of elements in the sequence and H is the Shannon entropy. The
number of sequences in the other group, called the low probability group,
is negligible and may be ignored. Thus the number of sequences in a chain
of 100 amino acids, such as cytochrome c, that one needs to be concerned
with, is not 20^100 but 2^100x3.3966 a number smaller by a factor of about
10^27. See Table 6.4 in my book. 

This result is virtually unknown by molecular biologists. Be the first in
your neighborhood to amaze your colleagues!

Problem for the student: calculate these two numbers to see what fraction
of the total possible lie in the high probability group. 

 Perhaps in another posting Matsuno will tell us what "synchronic
information" is. 

I thank Matsuno for pointing out my "keen warning against muddling" the
meaning of words. 

A large part of the confusion in Fox's  writings lies in the word-traps
random, order, disorder etc. See my explanation in BioEssays January 1995
and in Information Theory and Molecular Biology. 

For an explanation of the modern mathematical notion of random see When is
random random? Nature 344 p823 (1990)

Auf wiedersehn, Hubert 

From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!vixen.cso.uiuc.edu!uwm.edu!msunews!harbinger.cc.monash.edu.au!bruce.cs.monash.edu.au!lloyd
From: lloyd@cs.monash.edu.au (Lloyd Allison)
Subject: small UTM
Message-ID: <lloyd.793496965@bruce.cs.monash.edu.au>
Summary: smallest UTM ?
Keywords: universal Turing machine
Sender: news@bruce.cs.monash.edu.au (USENET News System)
Organization: Computer Science, Monash University, Australia
Date: Wed, 22 Feb 1995 23:49:25 GMT
Lines: 9

Does anyone know (refs) what the state of the art is in the search for
the smallest (various notions of size) Universal Turing Machine ?

Lloyd ALLISON
Central Inductive Agency,
Dept. of Computer Science, Monash University, Clayton, Victoria 3168, AUSTRALIA
tel: 61 3 905 5205       fax: 61 3 905 5146       email: lloyd@cs.monash.edu.au
<A HREF="http://www.cs.monash.edu.au/~lloyd">LA home</A>


From owner-info-theory@net.bio.net Wed Feb 22 22:00:00 1995
Path: biosci!rutgers!uwm.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!news.cac.psu.edu!news.tc.cornell.edu!travelers.mail.cornell.edu!newstand.syr.edu!jdimpson
From: jdimpson@newstand.syr.edu (Masterpiece Theater)
Newsgroups: bionet.info-theory
Subject: Chaos Theory
Date: 23 Feb 1995 16:24:24 GMT
Organization: the peak of Mt. Olympus, Syracuse U.
Lines: 16
Message-ID: <3iicro$rq6@newstand.syr.edu>
Reply-To: jdimpson@mailbox.syr.edu
NNTP-Posting-Host: forbin.syr.edu
X-Newsreader: TIN [version 1.2 PL2]

For my Introduction to Neurobiology class I'm taking here at Syracuse
University (taught by Dr. Robert Barlow) I have chosen to report for a final
project on known occurences of patterns of chaos found in human
physiology (preferably in the nervous system) Can anyone give me any
information on the topic?  A reference to some printed work would be
great, and any correspondence with people who have actually experimented
with the topic would be ideal (in other words, really really really great).
Thank you.

--
Jeremy Impson		       http://web.syr.edu/~jdimpson/

	Undergraduate, Syracuse University, Syracuse, NY
	Computer Science and Spanish Literature & Language		

			

From owner-info-theory@net.bio.net Thu Feb 23 22:00:00 1995
Path: biosci!VOSCC.NAGAOKAUT.AC.JP!kmatsuno
From: kmatsuno@VOSCC.NAGAOKAUT.AC.JP (koichiro matsuno/7129)
Newsgroups: bionet.info-theory
Subject: Re: Self- Organization and Information Theory
Date: 24 Feb 1995 03:49:11 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 54
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9502241152.AA01760@voscc.nagaokaut.ac.jp>
NNTP-Posting-Host: net.bio.net

   In article <3iiae9$7fr@newsbf02.news.aol.com>  hpyockey@aol.com (HPYockey)
writes:

>This result [the Shannon-McMillan Theorem] is virtually unknown by molecular 
>biologists. Be the first in
>your neighborhood to amaze your colleagues!

   Wait, Hubert! Don't forget ecologists though they may not belong to the 
high probability group of biologists at this moment. During sixties, Ramon
Margalef made an observation on ecological succession and noted that the 
biomass production rate (nearly equal to the dissipation rate) per unit 
biomass present in an ecosystem decreases with its succession. If the coming 
and going of biomass follow a Markovian process, what Margalef observed will
be prevalence of the high probability sequences in ecosystems. I seconded
Margalef's claim while starting from a general scheme of Markovian kinetics
in my 1978 paper (J. Theor. Biol. 70, 23-31).

>Perhaps in another posting Matsuno will tell us what "synchronic
>information" is.

   Deciphering the synchronic/diachronic dichotomy of anthropologically-
linguistically-philosophical origin is tough for me. Let me try only a little 
bit here. My mail-ordered Webster's 10th Edition lists diachronic as "of, 
relating to, or dealing with phenomena (as of language or culture) as they 
occur or change over a period of time", and synchronic as "concerned with 
events existing in a limited time period and ignoring historical antecedents". 
Shannon's information is synchronic because of ignoring historical 
antecedents when counted. Entropy referred to within the Shannon-McMillan 
scheme is also ahistorical.

   In contrast, if you consult Carl Friedrich von Weizsacker saying 
"Information is only that which produces information" (Die Einheit der 
Natur, p.351), information thus referred to will be historical and 
diachronic because of its persistent reference to the antecedents.

   Now, two choices are in front of us, either to discard diachronic 
information simply as nonsense and stick to the synchronic one, or to salvage 
diachronic information even reluctantly at the cost of the other company. My 
bet is for the latter simply because of the historicity of our presence. 
Don't take me wrong, though. Synchronic information constitutes a great 
edifice thanks to Shannon and his company, who made a completely sanitized 
form of information by depriving it of anything smacking of diachronic flavor.

   Regards,
   Koichiro

   Koichiro Matsuno
   Department of BioEngineering 
   Nagaoka University of Technology
   Nagaoka 940-21, Japan

   kmatsuno@voscc.nagaokaut.ac.jp
 


From owner-info-theory@net.bio.net Thu Feb 23 22:00:00 1995
Path: biosci!BELOIT.EDU!jungck
From: jungck@BELOIT.EDU
Newsgroups: bionet.info-theory
Subject: Re: Self- Organization and Information Theory
Date: 23 Feb 1995 17:14:28 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 28
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9502240113.AA16587@beloit.edu>
NNTP-Posting-Host: net.bio.net

I am glad to see someone else who has enjoyed the Hornos piece.  The editor
of Nature also gave it front billing in a lead editorial shortly after it
appeared.

Martha Bertman and I published a different group theoretical approach
(based on Klein-4 groups and their products):

"Group Graph of the Genetic Code" in J. Heredity (1979)

Also Gary Findley and others have a book out on group theory and genetic
coding that employs cyclic groups that appeared in the 80's; I believe it
is called the geometry of genetics.

However, ANdrews and Boss did a group theoretic approach before either of
us back in the early sixties.

*************************************************
John R. Jungck                                  *
Department of Biology                           *
Beloit College                                  *
700 College Street                              *
Beloit, WI 53511                                *
(608)-363-2267 office                           *
(608)-363-2226 messages                         *
(608)-363-2052 FAX                              *
e-mail: jungck@beloit.edu                       *
*************************************************


From owner-info-theory@net.bio.net Thu Feb 23 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!uunet!nctuccca.edu.tw!netnews.ntu.edu.tw!b3203054
From: b3203054@cc.ntu.edu.tw (Kai-hsu TAI)
Newsgroups: bionet.info-theory
Subject: biochemical computing
Date: 24 Feb 1995 12:58:19 GMT
Organization: National Taiwan University
Lines: 70
Message-ID: <3ikl5b$dh4@netnews.ntu.edu.tw>
NNTP-Posting-Host: b3203054@ccsun24.cc.ntu.edu.tw
X-Newsreader: TIN [version 1.2 PL2]

I just finished this stuff as a premature idea, and don't
know if it is appropriate for a undergrad freshman like me to 
post this radical (even stupid) proposal here, but I just want
to know what you think about this, so I decided to take my
risk.  There may be some bugs in this, and some phrases may 
not be the standard biochemical terminology, so please 
tolerate.
 
Kai-hsu TAI
BIOCHEMICAL COMPUTING PROPOSAL, OR WHATEVER YOU WANT TO CALL IT
===============================================================
 
OK, now for that biochemical computing stuff.  Let us assume
that we have a "add-ase", which is a kind of enzyme/protein
capable of adding 2 strands in the following way:
 
We have DNA sequences
 
AG = 1 = 10^0,
AAG = 10 = 10^1,
(A)nG = 10^(n-1).
 
CG represents the plus sign (+).
 
Then we program/synthesize a sequence
 
AGCGAG representing the expression "1 + 1".
 
We put the sequence into a beaker (?) containing the
"add-ase" and then analyze the resulting DNA sequences.  If
we are correct in the lab procedure and are lucky enough, we
shall have sequences
 
AGAG (the result sequence),
CG (the mathoperator) and
AGCGAG (the original sequence)
 
What the add-ase do is simply getting to the GCG part,
cutting down one CG, and then connecting the residual
"mathnumeral" part, ie,
 
 
           ooo       oxx       vov
AGCGAG -> AGCGAG -> AGCGAG -> AG AG -> AGAG
                           `> CG
o represents the add-ase attaching site; x represents the
cutting site; v represents the connecting site.
 
We can also have some "carry-ase", of which the
biochemputing diagrams is
 
                ooooooooo      xxxxxoxoo     vovov
z(AG)8AGAGy -> z(AG)8AGAGy -> z(AG)8AGAGy -> z A AGy -> zAAGy
                                          `> (AG)8 + G
 
where y and z represents any specific sequences.  I think
this carry-ase have some defects (bugs) but I also think it
can be corrected quite easily.  (Wait for the patch?  Oh,
spare your humor.)
 
I talked with some other people and one said that the major
obstacle in doing this is that the enzymes are hard to find.
 
OK, that's all folks, what do you think?

Cheers always,
Kai-hsu TAI <b3203054@cc.ntu.edu.tw>
Freshman Dept/Chem Natl Taiwan U
finger for more info


From owner-info-theory@net.bio.net Thu Feb 23 22:00:00 1995
Path: biosci!agate!howland.reston.ans.net!ix.netcom.com!netnews
From: GWolf@ix.netcom.com (John Cooper)
Newsgroups: bionet.info-theory
Subject: Hearing perception and Fooling the human ear
Date: 24 Feb 1995 14:24:50 GMT
Organization: Netcom
Lines: 21
Distribution: world
Message-ID: <3ikq7i$jf8@ixnews3.ix.netcom.com>
NNTP-Posting-Host: ix-bos5-16.ix.netcom.com

This is a question mainly directed at  Acustic specialists. As most of 
us know the magic of movies is made possible by being able to show a 
number of still images in a very rapid sucesstion. If the speed in the 
change of the images is below 1/10th of 1 second the eyes/brain do not 
percieve any still interuptions, just a smooth continuous moving image. 
I have a research project that involes a simular principle with te human 
hear.

1) Can the ear/brain be fooled with sound bytes in a similar way that 
the eye/brain is fooled by still images?

2) and if so what frequency of change would by required to do this?

I would greatly appreciate any information you could give me on tis 
subject, no matter how trival. Also please e-mail the responses directly 
to my mail box, This would be a bing help for me also.

Thanks in advance

John D.Cooper
GWolf@ix.netcom.com

From owner-info-theory@net.bio.net Thu Feb 23 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news2.near.net!info-server.bbn.com!news
From: mweinstein@bbn.com (Michael Weinstein)
Newsgroups: bionet.info-theory
Subject: Question re: Internet Data Feeds of Value
Date: 24 Feb 1995 20:12:43 GMT
Organization: Bolt Beranek & Newman
Lines: 15
Sender: mweinste@labs-n.bbn.com
Message-ID: <3ilejr$o6e@info-server.bbn.com>
NNTP-Posting-Host: mac19-253.bbn.com
X-Posted-From: InterNews 1.0.4@mac19-253.bbn.com
X-Authenticated: mweinste on POP host labs-n.bbn.com

We are building a "knowbot" driven Internet search and retrieval system
to support bio medical decision makers.  Toward this end, would be
pleased to receive any biases (opinions) regarding best data feeds --
annotated slightly with reasons would be even better.

In return, will keep list of people who reply and provide an update
(should they wish) of this effort to use relevent AI as information
search and analysis tool.

thanking you in advance


Michael Weinstein
BBN:  Internet Technologies and Services Group
mweinstein@bbn.com

From owner-info-theory@net.bio.net Thu Feb 23 22:00:00 1995
Path: biosci!galaxy.ucr.edu!ihnp4.ucsd.edu!agate!newsxfer.itd.umich.edu!gatech!howland.reston.ans.net!usc!usc!not-for-mail
From: wang@cajal.usc.edu (X. Wang)
Newsgroups: bionet.info-theory
Subject: Re: small UTM
Date: 24 Feb 1995 08:57:35 -0800
Organization: University of Southern California, Los Angeles, CA
Lines: 29
Sender: wang@cajal.usc.edu
Message-ID: <3il35v$bpu@cajal.usc.edu>
References: <lloyd.793496965@bruce.cs.monash.edu.au>
NNTP-Posting-Host: cajal.usc.edu
Keywords: universal Turing machine

In article <lloyd.793496965@bruce.cs.monash.edu.au> lloyd@cs.monash.edu.au (Lloyd Allison) writes:
>Does anyone know (refs) what the state of the art is in the search for
>the smallest (various notions of size) Universal Turing Machine ?
>
>Lloyd ALLISON
>Central Inductive Agency,
>Dept. of Computer Science, Monash University, Clayton, Victoria 3168, AUSTRALIA
>tel: 61 3 905 5205       fax: 61 3 905 5146       email: lloyd@cs.monash.edu.au
><A HREF="http://www.cs.monash.edu.au/~lloyd">LA home</A>
>

Hi, 
I was teaching an introductory computer science course last Fall at UCLA.
Here is a reference I found then on a shortest description of the universal 
Turing machine:

	"A Universal Turing Machine", S. Aanderaa (staal.aanderaa@math.uio.no),
	Computer Science Logic, Lecture Notes in Computer Science, 
	pp. 1-4, 1993.

It is a short paper. Please compare it with the version given by 
Minsky in his 1967's book, "Computation: Finite and Infinite Machines".

Hope this helps,

-- Xin Wang
xwang@cs.ucla.edu



From owner-info-theory@net.bio.net Sat Feb 25 22:00:00 1995
Path: biosci!agate!trib.apple.com!afsg.apple.com!NewsWatcher!user
From: zimnik@ftdetrck-atmo1.army.mil (Paul R. Zimnik, D.O.)
Newsgroups: bionet.info-theory
Subject: MILITARY TELEMEDICINE:  ON-LINE TODAY RESEARCH, PRACTICE, AND OPPORTUNITIES
Date: Sun, 26 Feb 1995 11:22:29 -0500
Organization: Medical Advanced Technology Management Office
Lines: 89
Message-ID: <zimnik-2602951122290001@140.139.41.101>
NNTP-Posting-Host: ftdetrck-matmoweb.army.mil

*******************************************************************************************************************

MILITARY TELEMEDICINE:  ON-LINE TODAY
RESEARCH, PRACTICE, AND OPPORTUNITIES

*******************************************************************************************************************


Forum Chair:
Brigadier General Russ Zajtchuk
Commander
US Army Medical Research and Material Command
Fort Detrick
Frederick, MD

Department of Defense Co-Chair:
COL Anna Chacko, MD, Chairman of Radiology 
Brooke Army Medical Center 
San Antonio, Texas 

International Co-Chair
Harold Glass, PhD, Department of Imaging Physics
Hammersmith Post Graduate Hospital
London, United Kingdom

AUSA Collaboration:
GEN Jack Merritt (ret)
Washington, DC

The McLean Hilton at Tyson's Corner
7920 Jones Branch Drive
McLean, VA 22102
March 27-29, 1995

For registration information contact:
Mark Schnur
Medical Advanced Technology Management Office
301-619-7927
schnur@ftdetrck-atmo1.army.mil

Mission Statement: 

The DOD has focused on the global research, development and deployment of
sophisticated communication, management and imaging network systems, which
will become an integral part of patient care activities. What lessons can
we learn from the experience of the DOD and others? What can we expect in
the future? 

A national forum has been organized to develop a direction for the future
by addressing these demanding questions and seeking the advice of experts
in academics, industry and the medical community. Submitted papers will be
considered for  inclusion in poster sessions. 

On January 30, 1995, the Speaker of the House of Representatives, the
Honorable Newt Gingrich speaking before the American Hospital Association
challenged US medicine with the following:

" I come here today to ask the American Hospital Association and all of
its members to profoundly rethink your stance and your assumptions, to
literally say erase the board.  I don't care what your positions were as
of 9 this morning, just drop all of them and rethink it . . . . if we
could cut three to five years out of the transition from R.&D., to
treatment, and if we could be networked to things like Internet, so that
every doctor and every hospital has equal access, equal information, so
that literally when you walk in, you're entering the world body of
knowledge. . . . And I'll tell you, people like the U.S. Army are doing
it.  They're trying to design systems where a soldier who's been shot and
has a particular problem, is by distance medicine being connected directly
from the field hospital to the finest specialist on the planet.
Now we can do that for our young men and women in uniform because we have
a large system, systematically thinking through it.  But then we ought to
transfer that to everybody else ."

Who Should Attend: 

The Forum is intended for those involved in the process of reengineering
health care.  This includes hospital administrators, those involved in
medical informatics, and other health providers, scientists, engineers and
facility planners, members of academia and industry with a special focus
on practical approaches to advanced technology as it applies to medicine.

-- 
Paul R. Zimnik, D.O.
Capt, USAF, MC
ProMed Project Manager
Medical Advanced Technology Management Office
Ft. Detrick
Frederick, MD  21702-5000
webmaster@ftdetrck-atmo1.army.mil

From owner-info-theory@net.bio.net Sat Feb 25 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!gatech!howland.reston.ans.net!news.sprintlink.net!uunet!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: sproutsrad@aol.com (SproutsRad)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 26 Feb 1995 16:00:44 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 36
Sender: root@newsbf02.news.aol.com
Message-ID: <3iqq5s$pqb@newsbf02.news.aol.com>
References: <3ioo5g$4cl@hamilton.maths.tcd.ie>
Reply-To: sproutsrad@aol.com (SproutsRad)
NNTP-Posting-Host: newsbf02.mail.aol.com

>I don't see any contrast.
>In each case greater _order_ is associated with lower entropy.
>I don't know why you associate this with greater _information_
>in thermodynamics.
 
Thank you.  With some measure of uncertainty I'll sketch out how it looks
to me.
We are talking about the term information which seems to have two
disparate meanings, one as a measure of specificity (as akin to order in
thermodynamics) and one in communication theory.   In thermodynamics
information and order are associated concepts and systems which have have
greater information have greater order and lower entropy. Here one can
associate information/order with a constraint on randomness.  The more the
randomness of a system is constrained the greater is its order or
information and the lower is its entropy.
  In Shannon's formulation of the information output for a given ergodic
source of a given number of symbols any constraint in randomness in the
use of those symbols is associated with greater -redundancy- not greater
information.  Here the greater the entropy of a source the greater is its
information.  
  It is difficult to sort this out based on things you read.  Take for
instance this quotation from Weaver in his preface to Shannon's The
Mathematical Theory of Communication: "Thus for a communication source one
can say, just as he would also say it of a thermodynamic ensemble, "This
situation is highly organized, it is not characterized by a large degree
of randomness or choice--that is to say,  the information (or the entropy)
is low.""
Your input appreciated.
Don Foster
Paonia, Colorado
SproutsRad




S

From owner-info-theory@net.bio.net Sat Feb 25 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!agate!howland.reston.ans.net!ix.netcom.com!netcom.com!grunwald
From: grunwald@netcom.com (Stefan Gruenwald)
Subject: New Biotechnology Discussion Group
Message-ID: <grunwaldD4L4zn.7Bv@netcom.com>
Keywords: Biotechnology, Information, Industry, Research, Opportunity
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
X-Newsreader: TIN [version 1.2 PL1]
Date: Sun, 26 Feb 1995 02:16:35 GMT
Lines: 71
Sender: grunwald@netcom.netcom.com

Dear colleagues:

A novel international Biotechnology Discussion List (BIZ-BIOTECH) has 
been created. 

Subscription address:        listserv@netcom.com
Biz-biotech owner address:   grunwald@netcom.com
Posting address:             biz-biotech@netcom.com

* How to SUBSCRIBE to biz-biotech:

    Send the following message
        SUBSCRIBE BIZ-BIOTECH
    to
        LISTSERV@NETCOM.COM

* How to UNSUBSCRIBE from biz-biotech:

    Send the following message
        UNSUBSCRIBE BIZ-BIOTECH
    to
        LISTSERV@NETCOM.COM


One of the major goals the of BIZ-BIOTECH list is to bring together 
scientist and business professionals from industry and academia that work 
on 
biomedical and biotechnical projects. To share information on ideas, 
projects, resources etc., creating a world-wide information and 
communication-forum around BIOTECHNOLOGY and its business applications 
and opportunities.

Topics of BIZ-BIOTECH include discussion about:
   Technology Transfer
   Public and private sector R&D efforts and resources
   Technical expertise and resources
   Technologies available for licensing and/or cooperative R&D
   Funding requests
   Venture capital sources
   Discussion about new product ideas
   Announcement of new products on the market
   Novel or existing biotechnological companies
   Biotechnology stocks
   Job opportunities
   etc.

Subscribers to this list may be researcher from academia, industry 
professionals, clinicians, students and other individuals interested in 
the topic of biotechnology and its commercial possibilities. Subscribers 
are encouraged to send messages of interest to BIZ-BIOTECH@NETCOM.COM. 
Personal messages and personal replies should be sent to an individual 
and NOT to the whole list. Otherwise, it might annoy other colleague 
BIZ-BIOTECH subscribers. Obviously, replies of general interest should be 
send to the entire list.

BIZ-BIOTECH is currently an open, unmoderated list. Any postings are 
automatically accepted. The list owner is Stefan Gruenwald (e-mail: 
grunwald@netcom.com). Subscribers are encouraged to support BIZ-BIOTECH 
actively within their own organizations and personal or professional 
net-works. The larger BIZ-BIOTECH becomes, the more useful it will be to 
all its subscribers. Commercial messages which are of general interest to 
the subscribers are allowed on BIZ-BIOTECH.

For further information about BIZ-BIOTECH, please send an e-mail to

Stefan Gruenwald
(grunwald@netcom.com)
BIZ-BIOTECH listowner




From owner-info-theory@net.bio.net Sat Feb 25 22:00:00 1995
Path: biosci!agate!howland.reston.ans.net!news.sprintlink.net!EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 26 Feb 1995 02:14:08 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 19
Message-ID: <3ioo5g$4cl@hamilton.maths.tcd.ie>
References: <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br> <3ioiqe$d2g@newsbf02.news.aol.com>
NNTP-Posting-Host: hamilton.maths.tcd.ie

sproutsrad@aol.com (SproutsRad) writes:

>There is the notion of
>information as associated with  stucture and thermodynamics where the
>greater the the organization/information of a system the lower is its
>entropy.  Contrast that with the notion of information in the sence of
>Shannon where the greater the information of a source the higher is its
>entropy. 

I don't see any contrast.
In each case greater _order_ is associated with lower entropy.
I don't know why you associate this with greater _information_
in thermodynamics.

-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Sat Feb 25 22:00:00 1995
Path: biosci!agate!howland.reston.ans.net!news.sprintlink.net!uunet!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: sproutsrad@aol.com (SproutsRad)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 25 Feb 1995 19:42:54 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 45
Sender: root@newsbf02.news.aol.com
Message-ID: <3ioiqe$d2g@newsbf02.news.aol.com>
References: <Pine.SUN.3.91.950217125304.4295A-100000@snfma1.if.usp.br>
Reply-To: sproutsrad@aol.com (SproutsRad)
NNTP-Posting-Host: newsbf02.mail.aol.com

Rafael,
 I am most grateful for your question and the ensuing discussion.  I would
like to offer to this dialogue some notions of my own which may be in need
of refinement.
    In general, it seems that science, in placing the mantle of definition
upon the phenomenon of nature, finds it expedient to cast off some
attributes which are either unessential or unmanageable.  One of the
attributes of information which Shannon found it necessary to dispense
with was meaning.  Shannon's definition characterises and quantifies the
potential of a "source" to generate "information" and while this
definition is useful enough to communication engineers it must be seen as
a limited subset of a larger notion of information in the world at large
where meaning has effect.
     Check me out here. It seems, as noted, that there is some confusion
and inconsistancy about the term information and its relation to the term
entopy.  (I prefer to attribute that confusion to my environs rather than
my own mind but, alas, it's not always possible.)  There is the notion of
information as associated with  stucture and thermodynamics where the
greater the the organization/information of a system the lower is its
entropy.  Contrast that with the notion of information in the sence of
Shannon where the greater the information of a source the higher is its
entropy. Yes, or have I bollixed it up?
     Finally, as to the heart of your question; does the notion of
information have any physical basis in nature?  Here are two possible
statements which would suggest the affirmative:
1) Information is an attribute of nature which we have found useful to
define and quantify.
2)  Information is an active agency, a primal player in effecting the
unfolding of nature.
  I suspect the latter statement to be most useful.  As to the role
information plays in nature, I would offer the following view which is
meant to be both playful and provocative.  Information in nature is the
source of "re-".  (That is, "re-" as in the prefix from the Latin meaning
back or backward, as in return or reiterate.) 

I would very much appreciate comment.

Don Foster
Paonia, Colorado
SproutsRad




S

From owner-info-theory@net.bio.net Sat Feb 25 22:00:00 1995
Path: biosci!agate!newsxfer.itd.umich.edu!gatech!howland.reston.ans.net!news.moneng.mei.com!uwm.edu!reuter.cse.ogi.edu!netnews.nwnet.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: Self- Organization and Information Theory
Date: 26 Feb 1995 20:24:58 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 22
Distribution: world
Message-ID: <3iqo2q$2eh@news.u.washington.edu>
References: <Pine.SUN.3.91.950223133843.4555D-100000@snfma1.if.usp.br>
NNTP-Posting-Host: ionesco.math.washington.edu

In article <Pine.SUN.3.91.950223133843.4555D-100000@snfma1.if.usp.br>,
szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:

|> 	While reading this paragraph, it came to my mind a recent paper 
|> on the evolution of the genetic code using group theory, it apeared in 
|> physical review letters 1993, december or november issue. 
|> 
|> title:	Algebraic approach to the evolution of the genetic code.
|> author: J.E. Hornos and I. Hornos
|> 
|> 	
|> if someone is interested and cannot find the article I will try to find 
|> it.

I saw that too, and at first I thought it was a joke.  I would be interested
in hearing the comments of any (Koichiro Matsuno?) who knows more about
how symmetry breaking works in physics.  I don't know enough about this to
really have any idea whether their idea can be taken seriously--- if it's not
utterly crackpot it must be a very interesting application of group theory,
one of my favorite mathematical subjects!

Chris Hillman

From owner-info-theory@net.bio.net Sat Feb 25 22:00:00 1995
Path: biosci!uunet.uu.net!tripos!rigel!zauhar
From: tripos!rigel!zauhar@uunet.uu.net (Randy Zauhar)
Newsgroups: bionet.info-theory
Subject: Re: "Complexity" and Development
Date: 26 Feb 1995 11:19:31 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 38
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <199502261913.TAA22479@rigel.tripos>
NNTP-Posting-Host: net.bio.net


 Chris Hillman writes:

   > 
   > Yes, but it does not follow that all the information needed to describe the
   > cell (or organism) resides within the genome.  Some of this ``information''
   > is presumably encoded not within the genome but, in some complicated fashion,
   > in the physical structures present along with the genome when the germ cell
   > first divides.  In short, ``naked DNA'' cannot develop into even a simple
   > organism; you need some minimal ``infrastructure'' to start decoding the DNA
   > and transforming the instructions into actions which ``construct'' your organism.

   Indeed, development of the three-dimenisonal form of an organism involves
 feedback between "geometry" (the arrangement of cells in tissues, mechanical
 strains, etc.) and gene expression. This goes beyond just having the 
 basic machinery of the single cell for transcription, translation and 
 regulation. I don't think that one can realistically argue that the sequence
 in genomic DNA in any way constitutes a 'program' in the conventional sense
 for building an organism. 

   At a lower level of organization, the three-dimensional conformation 
 of a protein also arises from interactions with the environment, and
 the peptide sequence does not in a simple or straightforward way make
 a 'program' for generating a three-D structure. The best we can say is 
 that there is a mapping between sequences and conformation that is 
 one-to-one under a given set of environmental conditions (i.e. a given
 sequence reproducibly gives rise to the same fold). 

    Randy

All opinions expressed here are mine, not my employer's

///////////////////////////////////////////////////////////////////////// 
\\ Randy J. Zauhar, PhD             | E-mail: zauhar@tripos.com        //
\\ Tripos, Inc.                     |       : uunet!tripos.com!zauhar  //
\\ 1699 S. Hanley Rd., Suite 303    |  Phone: (314) 647-1099 Ext. 3382 //
\\ St. Louis, MO 63144              |                                  //
/////////////////////////////////////////////////////////////////////////

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!VOSCC.NAGAOKAUT.AC.JP!kmatsuno
From: kmatsuno@VOSCC.NAGAOKAUT.AC.JP (koichiro matsuno/7129)
Newsgroups: bionet.info-theory
Subject: Re: Self- Organization and Information Theory
Date: 26 Feb 1995 17:27:21 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 49
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9502270130.AA16436@voscc.nagaokaut.ac.jp>
NNTP-Posting-Host: net.bio.net

In article <3iqo2q$2eh@news.u.washington.edu> hillman@math.washington.edu
(Christopher Hillman) writes:

>In article <Pine.SUN.3.91.950223133843.4555D-100000@snfma1.if.usp.br>,
>szeinfel@SNFMA1.IF.USP.BR (Rafael Iosef Najmanovich Szeinfeld {S) writes:
>
>|> title:	Algebraic approach to the evolution of the genetic code.
>|> author: J.E. Hornos and I. Hornos
>|> 
>|> if someone is interested and cannot find the article I will try to find 
>|> it.

>I saw that too, and at first I thought it was a joke. ... 
>--- if it's not
>utterly crackpot it must be a very interesting application of group theory,
>one of my favorite mathematical subjects!

   Perhaps, it would be more than just a joke. The essence of the problem of
relating the triplet codons to amino acids is how to get 20 out of 64=4x4x4.
What the Hornos couple have done is to look for a 64 dimension irreducible 
representation of a certain group that could squeeze out 20 dimensions with
the help of a certain symmetry breaking. The Hornos in fact discovered that 
the sympletic group Sp(6), a least popular Lie group?, does the job if an 
appropriate symmetry breaking is available. This kind of trick has been quite
popular among nuclear physicists. What they have been saying is "We don't
know why, but it works!". 

   At issue is the physical nature of the underlying symmetry-breaking process.
What the Hornos would seem to imply is that there must be a certain physical
process out there that may break the symmetry property latent in Sp(6). An 
indirect evidence for this is that they confirmed a correspondence between
the Casimir invariants of the 20 dimension irreducible representation and the 
polarities of 20 different kinds of amino acids. 

   Something must be there, though I am not knowledgeable enough to envision 
what that would be. One of the problems is how to reconcile the scheme with
the GC code if the latter would have been an evolutionary precursor. John 
Jungck may know more.

   Regards,
   Koichiro 

   Koichiro Matsuno
   Department of BioEngineering
   Nagaoka University of Technology
   Nagaoka 940-21, Japan

   kmatsuno@voscc.nagaokaut.ac.jp


From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!agate!library.ucla.edu!ihnp4.ucsd.edu!network.ucsd.edu!ridgway
From: ridgway@inls1.ucsd.edu (Doug Ridgway)
Newsgroups: bionet.info-theory
Subject: Re: Chaos Theory
Date: 27 Feb 1995 00:15:10 GMT
Organization: University of California at San Diego
Lines: 12
Message-ID: <3ir5ie$o1e@network.ucsd.edu>
References: <3iicro$rq6@newstand.syr.edu>
NNTP-Posting-Host: routh.ucsd.edu
X-Newsreader: TIN [version 1.1 PL8]

There was an article in Nature in the fall on an experiment where
they were controlling chaos in epileptic brain tissue. They had 
preparations of brain tissue in a dish drugged into a state of
random firing, and found that they could control it for the
behavior they wanted by small electric impulses. Since the control
pulses cause a firing, instead of being a perturbation of a system
parameter, it's not quite the usual Ott Grebogi Yorke control of
chaos scheme, but it was interesting. I can find a reference if you
want.

doug.


From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!gatech!howland.reston.ans.net!EU.net!ieunet!maths.tcd.ie!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 27 Feb 1995 00:47:12 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 23
Message-ID: <3ir7eg$naa@bell.maths.tcd.ie>
References: <3ioo5g$4cl@hamilton.maths.tcd.ie> <3iqq5s$pqb@newsbf02.news.aol.com>
NNTP-Posting-Host: bell.maths.tcd.ie

sproutsrad@aol.com (SproutsRad) writes:

>We are talking about the term information which seems to have two
>disparate meanings, one as a measure of specificity (as akin to order in
>thermodynamics) and one in communication theory.   In thermodynamics
>information and order are associated concepts and systems which have have
>greater information have greater order and lower entropy. Here one can
>associate information/order with a constraint on randomness.  The more the
>randomness of a system is constrained the greater is its order or
>information and the lower is its entropy.

Could you give a reference to the use of the term "information"
with this meaning in thermodynamics ?

I didn't think the word "information" _was_ used in thermodynamics
until after Shannon's Theory was expounded;
and then it was used in Shannon's sense.

-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Review of Yockey's Book: Preface and Chapter 1
Message-ID: <D4MvrC.2KG@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
Date: Mon, 27 Feb 1995 00:52:23 GMT
Lines: 219

Here are some of my responses to Hubert Yockey's book:

@book{Yockey1992,
author = "H. P. Yockey",
title = "Information Theory in Molecular Biology",
publisher = "Cambridge University Press",
address = "Cambridge",
isbn = "0-521-35005-0",
comment = "40 West 20th Street,
New York, N. Y.  10011-4211,
order number 350050",
phone = "1-800-827-7423",
price = "price as of 1994 October 31: \$74.95",
year = "1992"}

These comments are given in a constructive spirit.  I hope that Hubert will be
able to incorporate some or all of them into the second edition of his book.

PROLOGUE

First, so that people won't get the wrong idea from my little pickings below, I
want to say that the introductory material is generally very readable and makes
many excellent points about science.  Theoretical work is often misunderstood
by mainstream biologists, even though the example of theoretical versus
experimental physics is well known.  The proper use of theory in molecular
biology is still a ways off.

page 4:  "We must distinguish clearly between axioms and the theorems from
which they are derived."  This should be reversed:  "We must distinguish
clearly between theorems and the axioms from which they are derived." since
theorems are derived from axioms.

page 4: "information may be transferred ... DNA to protein."  What is an
example of this?  Translation of single stranded DNA in vitro?

page 5: "The question is: can information be transferred from a source with a
20-letter alphabet to a receiver with 64 letters or code words?" Though this is
early in the book, the answer is clearly yes:  so long as the information
content of the 20 letter alphabet is less than or equal to that of the 64
letter alphabet.

page 6: "information per symbol, H" - no this is the uncertainty.  As I've said
before, to call this the information leads to confusion.

page 7: "... the biological information system must be able to accommodate the
genetic messages of all organisms that have lived in the past, are now living
or may live in the future."  This can't be correct.  The system either works or
the organism dies.  It CANNOT plan for the future.  Indeed, it cannot not even
accommodate the past, since it only works here-now, as in Zen.

page 7:  Hubert is discussing von Neumann's suggestion to Shannon to call his
measure "entropy".  This, as we all know, has caused a ruckus ever since.  It
seems to me that it was actually a mistake of von Neumann to say that, as the
units are wrong.

page 7:  "The Maxwell-Boltzmann-Gibbs entropy of thermodynamics is based on the
probability of a selection of elements different from that of Shannon and the
two have no relation."  ... unless the probabilities they both use refer to the
same thing!

page 8: "It is on this theorem [capacity] that the ability of the genetic
message to preserve for so many million years the information needed to form a
coelacanth is based."  This would seem to imply to me that the genome of the
coelacanth is coded for preservation.  We don't know of any such codes.  It
seems more likely to me that the explanation is that this organism has been
"lucky" enough to live in an environment which has been unchanging for this
time.  The other aspect of this sentence is the implication that the genome
didn't change.  We don't really know that.  We know that the fish looks the
same, but we don't (yet??) have DNA samples to compare the old with the
recent.  One might have expected a lot of neutral drift if the shape of the
organism was being held in place by selection.  On the other hand, Yockey's
sentence would imply that some (amazing!) coding mechanism is keeping the
entire genome unchanged.  The latter seems unlikely to me.

page 8:  "The exact location of the various atoms that compose these
informational molecules is unimportant and merely clutters our thinking."  The
exact meaning of this sentence is frequently misunderstood by molecular
biologists.  The point here is that one can learn a great deal merely by
looking at the numbers of the states of a system, without looking at the
detailed structure.  Molecular biologists are so pre-occupied with structure
that they often miss interesting details because of this.  On the other hand,
every information system has a physical basis, and this can influence the
method of coding.  The simplest case of this which I am aware of is the
preponderance of protein contacts that use up to 2 bits of information from the
major groove, and the limitation of informational contacts from the minor
groove to 1 bit of information (in B-form DNA).  (See: Papp et al JMB
233:219-230, 1993.)  The reason for this depends greatly on the physical
locations of atoms in DNA.  In that case, we were quite surprised to find an
"exception".  It's a good example of building a simple, reasonable theory and
then looking for exceptions to learn new things.  The trouble is, in the
current anti-theory climate of biology the exceptions are misunderstood and
used to dismiss the theory before anything has been learned.

page 9:  Although Maddox has called many times for theory in biology, he
rejected one of my papers that did exactly what he was asking for.  He is not
supportive of theoretical biology.  (Actions speak louder than words.)

page 11:  "However, when theories are based on fundamental principles and not
on _ad_hoc_ scenarios, the error is in not taking them seriously enough.
Jaynes (1957a) points out that it is when theories fail to predict the results
of experiment that they are most useful.  Such discrepancies alert us to new
knowledge.  When a theory predicts correctly we are simply puzzle solving and
confirming what we already know (Kuhn, 1970)." BRAVO!  More molecular
biologists should read this!!

CHAPTER I

page 25: "Reasoning from axioms is the highest form of human thought."

This reminds me of the following wonderful quote:
********************************************************************************
The proof may seem to be unsatisfying: each step is correct, and hence the
conclusion is true, but it is not clear why the steps are there and where they
came from.  That is because there are at least eight levels of mathematical
understanding, and it is hard for someone on a lower level to appreciate what
goes on at a heigher level.  The levels are, I think:

1. Being able to do arithmetic.

2. Being able to substitute numbers in formulas.

3. Given formulas, being able to get other formulas.

4. Being able to understand the hypotheses and conclusions of theorems.

5. Being able to understand the proofs of theorems, step by step.

6. Being able to _really_ understand the proofs of theorems:  that is, seeing
why the proof is as it is, and comprehending the inwardness of the theorem and
its relation to other theorems.

7. Being able to generalize and extend theorems.

8. Being able to see new relationships and discover and prove entirely new
theorems.

Those of stuck on level 5 can no more understand the workings of a level 8 mind
than a cow could understand calculus.

Elementary Number Theory, 2nd ed, page 103-104, by Underwood Dudley.
********************************************************************************
(Does anybody know the year of publication and whether it is still in press?)

page 32:  Equation 1.7 seems to have an error.  The middle part of the
equation has 4 additive terms.  The third one should probably be a sum
(Sum from r = 1 to n-3, I think).

page 33:  7th line from top appears to be a typo.  "expanding (p1+p2+pk)n"
should probably be "expanding (p1+p2+...+pk)^n".  I imagine this is a
consequence of Hubert not typesetting the equations himself using TeX.

page 33: just below (1.9):  "Equation (1.9) ..." should be "Expression (1.9)
..." since there is no equality given.

page 41:  There is a reference to figure 1.2, but that is way far away on page
51.  It should be on the same page.  Also, it seems to refer to the wrong
figure, as CUG is mentioned, but that is in figure 1.3, not 1.2.

page 43:  The kind of stochastic matrix should be mentioned right at the
start.  As Dudley says, at level 5 (which we need to be to follow the text
here! ;-) the steps have to be clear.  More text as to the reasoning of the
steps would be useful.  For example, just before equation (1.42) "Consider the
equation" is useless, but "by definition of matrix multiplication" would be
much more helpful to the reader.  Two lines later "Then" (a waste word) could
be replaced by "by (1.42)".  The following line is justified by "since sum_k
pkj = 1".

page 50:  "UGA is an absorbing state because it is a termination codon,
therefore any transition to that state cannot lead to another."  The text here
has gotten so involved in the math that it has lost touch with the biology.
The transition diagram of figure 1.2 appears to be for mutations between bases
of a codon.  It isn't clear where the codon is (on a particular mRNA?  All
mRNAs?).  But if the transitions are about mutations, then there will be some
codons that can perfectly well mutate to a stop codon: especially near the end
of a protein or when it is not a coding frame at all.

My point is this:  mutation is very different from translation.  In translation
the stop codon means stop, while by mutations it is not an "absorbing state".
Motion ALONG an mRNA is NOT the same as changes within a single condon on an
mRNA.

Also, the transitions are given in RNA (with U's), but mutational changes
would, in most organisms, occur in the DNA.  The implication here is changes in
an RNA virus.

I suggest that the example be based on a realistic model to avoid freaking out
molecular biologists.

(On page 55 is "in generating a protein sequence", but the method is not
clear:  translation?  Be evolution?  Here the implication is that the
transitions are from one codon to the next, but then there would be no
constraints as shown in the figures.)

page 52:  The mathematically correct conclusion that the system will ultimately
end up in the absorbing state is biologically weird.  I have no idea what this
math is modelling.  Probability of creating various peptides?  Then why are
there mutations in particular places in the codons?  (This is the same problem
I pointed out above about the strangeness of the example given.)

page 53: Equation (1.64) gives a transition matrix that does not match the
transition graph of figure 1.3.  According to the matrix, all mutations are
allowed, but a number of them are missing from the figure for example, from AUG
to GUG.  Such transitions may be more common than the ones allowed by the
figure.  (This is related to exercise 9.)

page 54: Although I agree that the mathematics gives the exact result directly,
the computation is not so bad on modern computers.  I wrote a simple transition
matrix multiplication program in a few minutes.  The matrix gives 0.25000000 in
every position by 17 steps, and this took about 0.03 seconds on a sparcstation
20/61.

I will eventually get to later parts of the book.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www-lmmb.ncifcrf.gov/~toms

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!rutgers!gatech!swrinde!howland.reston.ans.net!news.sprintlink.net!uunet!in1.uu.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Many distinct ``entropies'' exist!
Date: 27 Feb 1995 21:36:05 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 24
Distribution: world
Message-ID: <3itgk5$drm@news.u.washington.edu>
NNTP-Posting-Host: escher.math.washington.edu
Keywords: Algorithmic complexity, Kolmogorov, Chaitin, Solomonoff

There is a new English translation of an important survey paper on algorithmic
information theory circa 1981 by V. V. Vyugin (translated by V. A. Uspensky)
available in Selecta Mathematica Vol. 13 No. 4 (1994).  I mention this here
because as Uspensky emphasizes in his introduction, few Westerners working
in this area seem to realize that not all definitions of algorithmic entropy
agree.  I learnt about this from the book edited by Uspensky cited in an earlier post,  
but as VAU says, Vyugin's paper is the only paper containing a proof that the
divergence between two common definitions of algorithmic entropy is UNBOUNDED.
The Vyugin paper also stresses that each definition has its own advantages
(and disadvantages).

The point is that in the former Soviet Union it is evidently already well
appreciated that there are many kinds of ``entropy'', each with its own sphere
of application.  With regard to the confusion that applying one name to hundreds
of different quantities can cause: this is a common problem in mathematics and
not really avoidable, given the hundreds of thousands of named objects and
the paucity of names for naming them with.  Probably the wisest course is to
give a definition whenever you introduce the word ``entropy''--- ideally we'd
all follow this even in postings, but clearly this is often impractical (for
instance, I won't attempt to define the two notions of entropy I refer to above---
if you're curious, the best way to learn them and to appreciate their distinction
is to read the Vyugin paper).

Chris Hillman

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!uunet!in1.uu.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: The Hornos paper on Symmetry Breaking and the Genetic Code
Date: 27 Feb 1995 21:12:19 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 255
Distribution: world
Message-ID: <3itf7j$dr5@news.u.washington.edu>
References: <9502270130.AA16436@voscc.nagaokaut.ac.jp>
NNTP-Posting-Host: escher.math.washington.edu
Keywords: representation theory, enumeration theory, biocomplexity


           THE NATURE OF THE PROBLEM CONSIDERED BY HORNOS AND HORNOS

In article <9502270130.AA16436@voscc.nagaokaut.ac.jp>,
kmatsuno@VOSCC.NAGAOKAUT.AC.JP (koichiro matsuno/7129) writes:

|>    Perhaps, it would be more than just a joke.

In that case, I'm interested in learning more about their idea.  Counter-intuitive
applications are usually the most interesting!

|> The essence of the problem of
|> relating the triplet codons to amino acids is how to get 20 out of 64=4x4x4.

Just to make sure I understand you correctly, let me try to rephrase this.
Given the fact that there are four types of base and that each codon is
three bases long, there are 4^3 = 64 possible codons.  However, there are
essentially only 20 amino acids, most of which are coded for by more than
one codon (usually closely related).  The question the Hornos' addressed
was: what was the ``original'' code and how did the present one come about
from it?  (Right?)  One feature which they must explain is that whenever one amino
acid corresponds to (say) three codons, these three codons almost always
differ in only one letter of the three letter ``word''.  (I think Tom
Schneider has previously mentioned an explanation for why this difference
occurs most often in the last letter and least often in the middle letter.)

|> What the Hornos couple have done is to look for a 64 dimension irreducible 
|> representation of a certain group that could squeeze out 20 dimensions with
|> the help of a certain symmetry breaking.


                        REPRESENTATION THEORY

Just for the benefit of those who know even less about this than I do (and
I don't know very much), a representation of a Lie group is a map taking
elements of some (perhaps abstractly defined) Lie group---
that is, a group which is a smooth manifold in its own right---
to a matrix group.  For instance, the circle  group
U(1) = \{ z \in \C: |z| = 1 \} (complex numbers of unit modulus, which form
a group under the usual multiplication of complex numbers) can be
represented by the group of two by two matrices like this:

       | a  -b |
       | b   a |       where a^2 + b^2 = 1

These matrices form a group under matrix multiplication.  The correspondence
is as follows: z = e^{i \theta} is represented by the matrix

       | \cos \theta       -\sin \theta |
       | \sin \theta        \cos \theta |

which is a rotation of \C by angle \theta about the origin.

More generally, a representation is a homomorphism \theta: G ----> GL(X),
which means that \theta(gh) = \theta(g) \theta(h).   This is true for
the example I just gave, since the matrix product 
      
       | \cos \phi  -\sin \phi |   | \cos \psi   -\sin \psi|
       | \sin \phi   \cos \phi |   | \sin \psi    \cos \psi|

   =   | \cos (\phi + \psi)  -\sin(\phi + \psi) |
       | \sin (\phi + \psi)   \cos(\phi + \psi) |

(use the addition formulae for sine and cosine) shows that

  \theta(zw) = \theta(z) (\theta(w)

where z = e^{i \phi} and w = e^{i \psi} are in U(1).
Note: GL(X) means the group of invertible matrices on the vector space X. 

Sometimes a represention in terms of matrices acting on \R^n has
an invariant subspace of smaller dimension, W \subset \R^n.  This means
that for all \theta(g) \in GL(\R^n)
This gives a new representation by restricting the matrices to that
subspace.  In this case the original representation is called ``reducible'',
meaning that it contains a ``smaller'' representation within itself.
``Irreducible'' representations are ``minimal'' in the sense that they
contain no smaller representations; a good analogy is

      a reducible representation <-------> a molecule
      its irreducible constituents <-----> the atoms contained in the molecule
      an irreducible representation <----> an atom (chemical element)

(I am oversimplying here, since the way in which reducible representations
are composed of irreducible ones is more complex than the way in which
molecules are composed of atoms.)

Representation theory is one of the most highly developed of all fields
in mathematics, and has undergone revolutionary developments every few decades
of this century; one of these is occuring even as we speak.  Our library has over
270 books on representation theory!  Many have appeared SINCE 1990; two examples
are:

Author:       James, G. D. (Gordon Douglas), 1945-.
Title:        Representations and characters of groups / Gordon D. James and
              Martin W. Liebeck.
Pub. Info.:   Cambridge ; New York : Cambridge University Press, 1993.
LC Subject:   Representations-of-groups.

(Highly recommended: this is a book designed for self-teaching.  However,
this book only discusses finite groups, not the Lie group reps needed for
following the Hornos' paper.)

Author:       Fulton, William, 1939-.
Title:        Representation theory : a first course / William Fulton, Joe
              Harris.
Pub. Info.:   New York : Springer-Verlag, 1991.
LC Subject:   Representations-of-groups.
              Representations-of-algebras.
              Lie-groups.
              Lie-algebras.

(Concentrates on Lie groups, which need more sophisticated theory.)



   REPRESENTATION THEORY, KNOT POLYNOMIALS, ENUMERATION, AND BIOCOMPLEXITY

There is a deep connection between representations (of Lie algebras)
and the recent work on polynomial invariants of knots, links,
and submanifolds by people like Edward Witten  and Vaughan Jones,
and thus to my inchoate proposal for an axiomatic approach
to defining biocomplexity.  For the connection with three manifolds, see

Author:       Kauffman, Louis H., 1945-.
Title:        Temperley-Lieb recoupling theory and invariants of 3-manifolds /
              by Louis H. Kauffman and Sostenes L. Lins.
Pub. Info.:   Princeton, N.J. : Princeton University Press, 1994.
LC Subject:   Knot-theory.
              Three-manifolds-Topology.
              Invariants-Mathematics.

and for the general connection with superstring theory see

Author:       Atiyah, Michael Francis, 1929-.
Title:        The geometry and physics of knots / Michael Atiyah.
Pub. Info.:   Cambridge ; New York : Cambridge University Press, 1990.
LC Subject:   Knot-theory.

This stuff is also deeply connected to statistical mechanics by an idea due
to Louis Kauffman, who was able to intrepret the HOMFLY polynomial of a certain
link as the partition function of a certain model in statistical mechanics.
I recently saw a paper in which it is shown that Witten's work on three manifold
invariants also involves something which looks like a partition function.
All these connections have resulted in explosive development in knot theory
and superstring theory in recent years.   I am particularly excited by the
statistical mechanics connection because this is the first deep connection
between the unified field theory problem and one of the hardest problems of
``classical'' physics, namely the quantitative theory of phase transitions.
(Actually, the first deep connection was the Yang-Baxter theory, and the
second was Wilson's work on renormalization groups, but these are now more
naturally described as the first steps toward the current revolution in
representation theory which began. historically speaking, with Vaughan Jones'
work on knot polynomials, but logically begins with Yang-Baxter.)

There is also a connection to enumeration theory, in particular to enumeration
of the ways of coloring the vertices of a given graph according to certain
constraints.  This gives another connection with my biocomplexity proposal,
and also there is a direct connection with the dual lattices appearing
in my ``algebraic theory of geometric information''.  See:

Author:       Olsson, Jorn.
Title:        Combinatorics and representations of finite groups / Jorn B.
              Olsson.
Pub. Info.:   [Essen : Fachbereich Mathematik, Universitat Essen], 1993.
LC Subject:   Combinatorial-analysis.
              Finite-groups.
              Partitions-Mathematics.
              Representations-of-groups.

Moral of the story: there's never been a better time to learn representation
theory and to look for new applications of it, including perhaps in biology!


        REPRESENTATIONS OF LIE GROUPS AND SYMMETRY BREAKING IN PHYSICS

In physics, Lie groups arise in the following sophisticated manner: it turns
out that one can think of a physical field as a ``section'' of a G-bundle on
space-time \R^4.   This means that every point in \R^4 is assigned
a value in G; this can be thought of as a ``sheet'' in a manifold
of dimension m + 4, where \dim G = m, which locally looks like \R^{m+4}.
The sheet can bulge up and down in various places; the rate of bulging in each
place gives the strength at that place of the corresponding physical field.
(I'm oversimplifying a bit, but I think this gives the idea).  Every section
corresponds roughly to a Newtonian ``potential'' (whose divergence is the
field itself), and there are ``gauge transformations'' taking one potential
to other potentials giving the same physical field.  Basically, the potentials
are easier to work with mathematically, but have the tricky aspect that many
potentials give the same field.  The gauge potentials describe a sort of
symmetry in the process of representing fields by potentials.

For electromagnetism, G = U(1), the circle group of two dimensional
rotations discussed above.  This corresponds to the classical idea of 
representing an electro-magnetic field by choosing a ``vector potential'' 
in R^{3+1} such that the exterior derivative (generalizes both curl and div)
of this vector potential field is the electro-magnetic field itself.  The guage
transformations correspond to certain ``phase changes'' where one rotates each
vector potential a bit. In the case of U(1), these gauge transformations are
fairly simple (see Feynman's Lectures on Physics Vol. II) and the sophisticated
G-bundle picture might seem to be overkill.  But it turns out that the G-bundle
picture gives important ``topological constraints'' on the field.  For
instance, you might expect that a U(1) bundle over the circle would be
a torus, but it might be a Klein bottle instead.   In the U(1) case, the
topological constraints lead to the important Aharnov-Bohm effect.  (Right?)

I think the connection with symmetry breaking is that the topology of the
G-bundle can force certain symmetries.  (Right?)  I guess that breaking
a symmetry of some representation corresponds algebraicly
to taking some sort of ``quotient'' representation of the original one.  (Right?)
I suppose this might come about by passing from a G-bundle
to an H-bundle which forces fewer symmetries, but I'm not clear whether H would be
a quotient group or a subgroup of G.  Physically, subgroups strike me as most
natural, but algebraicly I would expect a quotient.  Or maybe I'm not even
close.  Can anyone clarify this?


         SYMMETRY BREAKING IN THE EVOLUTION OF THE GENETIC CODE

I am also hoping someone can explain exactly what symmetry breaking
means in this context!  All I understand is that the Hornos' found
a 64 dimensional irreducible representation of Sp(6) which has certain
symmetries which disappear when you take a 20 dimensional ??? quotient ???.
I guess that each symmetry breaking corresponds to a step in the evolution
of the genetic code, progressing from a simple and symmetric code to a 
more complex and much less symmetric one.  Perhaps the original simple code
corresponds to the quotient (by all those symmetries) of a given G-bundle
and the present code corresponds to the G-bundle without any quotients?
(I think I'm pretty confused!)  

|> The Hornos in fact discovered that 
|> the sympletic group Sp(6), a least popular Lie group?, does the job if an 
|> appropriate symmetry breaking is available. This kind of trick has been quite
|> popular among nuclear physicists. What they have been saying is "We don't
|> know why, but it works!". 

Symplectic groups arose historically in Hamiltonian mechanics, so there is
a connection with Maxwell's statistical model of an ideal gas, and thus
another connection to statistical mechanics and to physical entropy.
The group Sp(6) is a certain group of 12 by 12 matrices;
I think these correspond to ``canonical transformations'' in phase space coordinates
for a six particle system in the line; the twelve dimensions correspond to
taking velocities and positions of each of the six particles.  (Right?)  If so,
how does this relate to the three letters of each codon and the four bases?

|>    At issue is the physical nature of the underlying symmetry-breaking process.
|> What the Hornos would seem to imply is that there must be a certain physical
|> process out there that may break the symmetry property latent in Sp(6). An 
|> indirect evidence for this is that they confirmed a correspondence between
|> the Casimir invariants of the 20 dimension irreducible representation and the 
|> polarities of 20 different kinds of amino acids. 

What sort of physical process do they have in mind?  And what is a Casimir
invariant?  (I must be looking at the wrong book, namely Fulton and Harris.)

Chris Hillman

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!swiss.ans.net!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: hpyockey@aol.com (HPYockey)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 27 Feb 1995 12:38:39 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 91
Sender: root@newsbf02.news.aol.com
Message-ID: <3it2mv$9u6@newsbf02.news.aol.com>
References: <3ir7eg$naa@bell.maths.tcd.ie>
Reply-To: hpyockey@aol.com (HPYockey)
NNTP-Posting-Host: newsbf02.mail.aol.com

Subject: Re: physical meaning of information.
From: tim@maths.tcd.ie (Timothy Murphy)
Date: 27 Feb 1995 00:47:12 -0000
Message-ID: <3ir7eg$naa@bell.maths.tcd.ie>

sproutsrad@aol.com (SproutsRad) writes:

>We are talking about the term information which seems to have two
>disparate meanings, one as a measure of specificity (as akin to order in
>thermodynamics) and one in communication theory.   In thermodynamics
>information and order are associated concepts and systems which have have
>greater information have greater order and lower entropy. Here one can
>associate information/order with a constraint on randomness.  The more
the
>randomness of a system is constrained the greater is its order or
>information and the lower is its entropy.

Timothy Murphy replies: "Could you give a reference to the use of the term
"information"
with this meaning in thermodynamics ?

I didn't think the word "information" _was_ used in thermodynamics
until after Shannon's Theory was expounded;
and then it was used in Shannon's sense."

Reply from Hubert P. Yockey:
The word "information" was used in thermodynamics and in statistical
mechanics and in communication before Shannon. The word "information"
leads many people to  a word-trap. 
The first reference I have in which "information" was used in a
thermodynamic sense is Ueber die entropieverminderung in einem
thermodyamischen System bei Eingriffen intelligenter Wesen by Leo Szilard
in Zeitschrift fuer Physik (1929) v53, 840-856. It is believed that this
is the earliest paper in which a relation between entropy as it is
concieved in statistical mechanics and as it is concieved in
communication. English translation "On the decrease on entropy in a
thermodynamic system by the interventon of intelligent beings." in
Published Papers in Physics 1925-1039 reprinted from Behavioural Science v
9 October 1964.


The second reference I have is: "The Symmetry of Time in Physics" by the
famous chemist Gilbert Newton Lewis Science volume LXXI Roman numerals 71.
Page 569-577 (1930)
 He says on page 569: "In studying the vastly complex phenomena of nature,
as they come to us through our sense impressions, we could make little
headway did we not segregate and idealize certain groups of like phenomena
for the purpose of special study. Such segratations define the several
branches of science, of which one of the most highly specialized and
idealized is physics. Only of few types of phenomena are included within
its bounds, and in its study we consciously abstain from employing many of
our commonest ideas, such as purpose, goodness, beauty." 

On page 573 he says: "Whence we  have come to our most important
conclusion. The increase of entropy (that is thermodynamic entropy,
Shannon entropy had not been invented) comes when a known distribution
goes over to an unknown distribution. The loss, which is characteristic of
an irreversible process, is LOSS of INFORMATION (italics in the original).
(Next paragraph begins) : "Gain in entropy means loss of information and
nothing more." This is but a sample of what Lewis means. One should read
the whole article.

R. V. L. Hartley was the first to realize the need for: "....a
quantitative MEASURE whereby the capacities of various systems to transmit
INFORMATION may be compared. ..."It is desirable therefor to eliminate the
psychological factors involved and to establish a measure of information
in terms of purely physical quantities." (My emphasis.) Bell System
Technical Journal v 7 535-563 (1928). 

The next paper is the well known one by Shannon in The Bell Telephone
Technical Journal 1948. Please note the sentence: "Frequently the messages
have meaning; that is they refer to or are correlated according to some
system with certain physical or conceptual entities. These semantic
aspects of communication are irrevelant to the engineering problem."
To apply this to molecular biology one needs only to replace the word
"engineering" with "biological".

We now come to Chaitin, Kolmogorov, Lofgren, Martin-Loef and Solomonoff. I
have discussed these questions in Information and Molecular Biology
published by Cambridge University Press in 1992. See also Algorithmic
Information Theory by Gregory Chaitin third edition Cambridge University
Press (1989) and Chaitin, Information-Theoretic Incompleteness World
Scientific, Singapore (1992)

See also When is random random? in Nature v344 p823 (1990) and BioEssays v
17 p85-88 (1995)
"Words are merely names of mathematical functions or ideas and take their
meaning from the mathematics, not the other way around." 

Best regards  Hubert P. Yockey   


From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!torn!news.unb.ca!nwsd005.labcsd.unb.ca!r9q9
From: r9q9@unb.ca
Newsgroups: bionet.info-theory
Subject: need faq on luppus
Date: Mon, 27 Feb 1995 16:53:35 GMT
Organization: University of New Brunswick
Lines: 11
Message-ID: <r9q9.11.2F52038F@unb.ca>
NNTP-Posting-Host: nwsd005.labcsd.unb.ca
X-Newsreader: Trumpet for Windows [Version 1.0 Rev B final beta #1]

Does anyone know where I can find info on luppus the skin disorder.  If you 
have any info on it, I would much appreciate it being sent to me.


thank you 

Kraig MacKinnon

R9Q9@spitfire.unb.ca

 

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!gatech!swiss.ans.net!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: hpyockey@aol.com (HPYockey)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 27 Feb 1995 10:29:24 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 14
Sender: root@newsbf02.news.aol.com
Message-ID: <3isr4k$8jl@newsbf02.news.aol.com>
References: <3is0lt$66i@newsbf02.news.aol.com>
Reply-To: hpyockey@aol.com (HPYockey)
NNTP-Posting-Host: newsbf02.mail.aol.com

The relation or rather lack thereof between information in communication
as seen by Hartley 1928 and Shannon 1948 is discussed in my book
Information Theory and Molecular Biology. Published by Cambridge
University Press 1992. Check the index. I discussed that in  a number of
places. 

Remember that entropy in classical thermodymics is a concept quite
different from that in classical or quantum statistical mechanics. It has
to do only with energy and work appropriate especially to heat and heat
engines.  Classical thermodynamcs does not involve the concept of atoms.
The notion of continuous matter was acceptable until 1909 when the last
chemist agreed that atoms were not just a useful fiction. 

Hubert P Yockey

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!uunet.uu.net!tripos!rigel!zauhar
From: tripos!rigel!zauhar@uunet.uu.net (Randy Zauhar)
Newsgroups: bionet.info-theory
Subject: The Hornos paper on Symmetry Breaking and the Genetic Code
Date: 27 Feb 1995 15:18:06 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 27
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <199502272306.XAA18909@rigel.tripos>
NNTP-Posting-Host: net.bio.net



  Chris,

    I know what Lie groups are, and also group representations, so 
 could almost follow your comments on the Hornos' paper (which I do not have).
 The basic idea you relate is that they have a 64-dimensional group, 
 the symmetries of which disappear if you form some sort of quotient
 (with a subgroup?)  My question is: is it conceivable that if the genetic
 code actually specified some number other than 20 amino acids, that you 
 could find another Lie goup that would "fit"? Is there something terribly 
 special about the group they are using?

  (I am afraid that the examples of Lie groups that I have encountered
   are pretty trivial compared to what these folks msut be doing!) 

    Randy


All opinions expressed here are mine, not my employer's

///////////////////////////////////////////////////////////////////////// 
\\ Randy J. Zauhar, PhD             | E-mail: zauhar@tripos.com        //
\\ Tripos, Inc.                     |       : uunet!tripos.com!zauhar  //
\\ 1699 S. Hanley Rd., Suite 303    |  Phone: (314) 647-1099 Ext. 3382 //
\\ St. Louis, MO 63144              |                                  //
/////////////////////////////////////////////////////////////////////////

From owner-info-theory@net.bio.net Sun Feb 26 22:00:00 1995
Path: biosci!bloom-beacon.mit.edu!panix!zip.eecs.umich.edu!newsxfer.itd.umich.edu!agate!howland.reston.ans.net!news.sprintlink.net!uunet!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: sproutsrad@aol.com (SproutsRad)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 27 Feb 1995 02:57:49 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 38
Sender: root@newsbf02.news.aol.com
Message-ID: <3is0lt$66i@newsbf02.news.aol.com>
References: <3ir7eg$naa@bell.maths.tcd.ie>
Reply-To: sproutsrad@aol.com (SproutsRad)
NNTP-Posting-Host: newsbf02.mail.aol.com

>Could you give a reference to the use of the term "information"
>with this meaning in thermodynamics ?

 Thank you.  The only example of its use in this fashion which I have
available at the moment comes from a section discussing the second law of
thermodymanics, a footnote on page 419 of General Physics by Douglas
C.Giancoli, 1984: "A more orderly arrangement is thus one that requires
more information  to specify or classify it. When we have one hot and one
cold body, we have two classes of molecules and two pieces of information,
when they come to the same temperature there is only class and one piece
of information. ... In this sence, information is connected to order, or
low entropy."

>I didn't think the word "information" _was_ used in thermodynamics
>until after Shannon's Theory was expounded;
>and then it was used in Shannon's sense.

That's what I am not sure about.  For example, see if this follows:  You
are seated in a swivel chair in the center of a round room.  You are
holding a ping pong paddle.  Mounted around the wall of the room, equally
spaced are twenty seven squirt guns labeled A through Z and Space.  The
squirt guns fire according to associated letter symbols generated by a
source. Your job is to protect yourself with the ping pong paddle.  If the
letters are generated at random we have a situation which in Shannon's
sence produces maximun uncertainty and hence maximum information (and
perhaps wetness).  If the letters are produced by English text there is
(one hopes) more order, in Shannon's sence less information and you have a
better chance of staying dry.
I'd be plaeased to hear your thoughts on this.

Don Foster
Paonia, Colorado
SproutsRad




S

From owner-info-theory@net.bio.net Mon Feb 27 22:00:00 1995
Path: biosci!rutgers!csn!yuma!purdue!haven.umd.edu!gamera.umd.edu!cbl.umd.edu!not-for-mail
From: ulan@cbl.umd.edu (Robert Ulanowicz)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 28 Feb 1995 08:49:12 -0500
Organization: Chesapeake Biological Laboratory
Lines: 22
Message-ID: <3iv9ko$a5o@cbl.umd.edu>
References: <3is0lt$66i@newsbf02.news.aol.com> <3isr4k$8jl@newsbf02.news.aol.com>
NNTP-Posting-Host: cbl.umd.edu

hpyockey@aol.com (HPYockey) writes:

>Classical thermodynamcs does not involve the concept of atoms.
>The notion of continuous matter was acceptable until 1909 when the last
>chemist agreed that atoms were not just a useful fiction. 

Some might be interested to learn that classical thermo is still taught 
(or at least *was* taught as late as 30 years ago) as though atoms did 
not exist. I vividly recall preparing with my fellow graduate students 
for my oral exams in thermodynamics. If anyone mentioned the words 
"atoms" or "molecules", "the gong sounded, the crook yanked you offstage - 
you were no longer a graduate student, you were a nobody!" (Apologies to 
Tommy Smothers, in case anyone remembers him! :-)

This probably sounds like a fatuous exercise to many on the net, but it 
gave us an early glimpse into the hierarchical view of nature, that today 
many seem strangely reluctant to accept. As the current saying goes, "What 
goes around, comes around!" :-)

Cheers,
Bob U.


From owner-info-theory@net.bio.net Mon Feb 27 22:00:00 1995
Path: biosci!VOSCC.NAGAOKAUT.AC.JP!kmatsuno
From: kmatsuno@VOSCC.NAGAOKAUT.AC.JP (koichiro matsuno/7129)
Newsgroups: bionet.info-theory
Subject: Re: The Hornos paper on Symmetry Breaking and the Genetic Code
Date: 28 Feb 1995 01:37:19 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 34
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9502280941.AA08017@voscc.nagaokaut.ac.jp>
NNTP-Posting-Host: net.bio.net

In article <3iuef8$ifr@news.u.washington.edu> hillman@math.washington.edu
(Christopher Hillman) writes:

>In article <199502272306.XAA18909@rigel.tripos>,
>tripos!rigel!zauhar@uunet.uu.net (Randy Zauhar) writes:

>|>  My question is: is it conceivable that if the genetic
>|>  code actually specified some number other than 20 amino acids, that you 
>|>  could find another Lie goup that would "fit"? Is there something terribly 
>|>  special about the group they are using?

>I get the impression their representation is very, very special.

   Very special, indeed!! What I am concerned is the likelihood or 
unlikelihood of salvaging the symmetry primary view while facing symmetry-
breaking in the biological realm. Although the Hornos couple didn't say 
explicitly in their paper based upon the symmetry primary view, what 
they actually meant would seem to be an extension of the notion of the space 
beyond the ordinary three dimensional one, like particle physicists did who 
coined isospin space more than 40 years ago. This extension is a key to get
such a symmetry-breaking. I would first like to see how well they would do 
with theirs although my favortie is the opposite (, namely, the symmetry-
breaking primary view :-). 

   Regards,
   Koichiro

   Koichiro Matsuno
   Department of BioEngineering
   Nagaoka University of Technology
   Nagaoka 940-21, Japan

   kmatsuno@voscc.nagaokaut.ac.jp


From owner-info-theory@net.bio.net Mon Feb 27 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!alfa02.medio.net!netnews.nwnet.net!nntp.cac.washington.edu!math.washington.edu!hillman
From: hillman@math.washington.edu (Christopher Hillman)
Newsgroups: bionet.info-theory
Subject: Re: The Hornos paper on Symmetry Breaking and the Genetic Code
Date: 28 Feb 1995 06:05:28 GMT
Organization: "University of Washington, Mathematics, Seattle"
Lines: 69
Message-ID: <3iuef8$ifr@news.u.washington.edu>
References: <199502272306.XAA18909@rigel.tripos>
NNTP-Posting-Host: ionesco.math.washington.edu

In article <199502272306.XAA18909@rigel.tripos>,
tripos!rigel!zauhar@uunet.uu.net (Randy Zauhar) writes:

|>     I know what Lie groups are, and also group representations, so 
|>  could almost follow your comments on the Hornos' paper (which I do not have).

I didn't have it in front of me either.  I was speculating on what it might
mean based upon my memory of it--- probably a dangerous practice.  I am hoping
someone who really understands this will answer all my questions.

|>  The basic idea you relate is that they have a 64-dimensional group,
                       ^guessed at                              ^ representation
                                                                  of a group 
|>  the symmetries of which disappear if you form some sort of quotient
                                                               ^what I guess
                                                                might be a quotient

|>  (with a subgroup?)

You got me.  I really don't know enough about representation theory to be
able to understand how the physicists model ``symmetry breaking'' without
some help from someone who either knows A LOT more about representations
or A LITTLE more about the physical model on which the Hornos & Hornos paper
is based.   I'm somewhat alarmed that Koichiro Matsuno wrote (referring
to symmetry breaking in physics): `` This kind of trick has been quite
popular among nuclear physicists.  What they have been saying is "We don't
know why, but it works!" '' I'm worried because that might mean that there IS
no model, that it is just a trick which no one can even provide heuristic
motivation for.

|>  My question is: is it conceivable that if the genetic
|>  code actually specified some number other than 20 amino acids, that you 
|>  could find another Lie goup that would "fit"? Is there something terribly 
|>  special about the group they are using?

I think Koichiro Matsuno could comment more authoritatively about this.
I get the impression their representation is very, very special.

|>   (I am afraid that the examples of Lie groups that I have encountered
|>    are pretty trivial compared to what these folks msut be doing!) 

I'm feeling somewhat awed myself.
 
My knowledge of symplectic groups per se is pretty much limited to the
definition and what I said about contact transformations in Hamiltonian
mechanics.  I do know that they fit into a larger picture that I understand,
but I don't think this aspect of symplectic groups (it concerns how
Lie groups which arise as the ``isometry group'' of some quadratic form relate
to their Lie algebras--- the form in the case of symplectic groups looks like

       | 0  -I |
       | I   0 |

where I is the n by n identity matrix--- is relevant to symmetry breaking.
The motivation for this strange anti-symmetric quadratic form will be found
in the beautiful book

Author:       Flanders, Harley.
Title:        Differential forms, with applications to the physical sciences.
Pub. Info.:   New York, Academic Press, 1963.
LC Subject:   Differential-forms.
              Mathematical-physics.

This book also contains a beautiful connection between thermodynamical
entropy and Cauchy's theorem in complex analysis, among other treasures.
It was reprinted by Dover books, I think, as a cheap paperback--- possibly
the finest book ever reprinted in that series.

Chris Hillman

From owner-info-theory@net.bio.net Mon Feb 27 22:00:00 1995
Path: biosci!rutgers!gatech!howland.reston.ans.net!Germany.EU.net!ieunet!maths.tcd.ie!not-for-mail
From: tim@maths.tcd.ie (Timothy Murphy)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 28 Feb 1995 01:24:38 -0000
Organization: Dept. of Maths, Trinity College, Dublin, Ireland.
Lines: 50
Message-ID: <3itu0m$7l8@hamilton.maths.tcd.ie>
References: <3ir7eg$naa@bell.maths.tcd.ie> <3is0lt$66i@newsbf02.news.aol.com>
NNTP-Posting-Host: hamilton.maths.tcd.ie

sproutsrad@aol.com (SproutsRad) writes:

>>I didn't think the word "information" _was_ used in thermodynamics
>>until after Shannon's Theory was expounded;
>>and then it was used in Shannon's sense.

>That's what I am not sure about.  

A couple of people sent me quotations where the word "information"
was used before Shannon's theory came out,
so I must withdraw my claim above.
However, the references were somewhat peripheral,
so I think it would be true to say that the term 
did not play a significant role in thermodynamics
until after Shannon's theory was known,
at which point many attempts were made to connect
the two notions of entropy.

>For example, see if this follows:  You
>are seated in a swivel chair in the center of a round room.  You are
>holding a ping pong paddle.  Mounted around the wall of the room, equally
>spaced are twenty seven squirt guns labeled A through Z and Space.  The
>squirt guns fire according to associated letter symbols generated by a
>source. Your job is to protect yourself with the ping pong paddle.  If the
>letters are generated at random we have a situation which in Shannon's
>sence produces maximun uncertainty and hence maximum information (and
>perhaps wetness).  If the letters are produced by English text there is
>(one hopes) more order, in Shannon's sence less information and you have a
>better chance of staying dry.
>I'd be plaeased to hear your thoughts on this.

I'm not sure that this goes against the intuitive notion of information.
In the first case you need more information (a lot more !) to keep dry.
But that may be a meretricious argument.

I suspect this may be the same issue that would arise with energy, say,
if someone said, "How can you say that that ball at the top of the hill
has high energy, when it is not moving at all ?"

But to me the essential point is that Chaitin (following Shannon)
gave a precise value to what he termed the "informational content"
(or entropy) H(s) of a string s.
This might not correspond to our intuitive idea of information.
But I think the term helps to "explain" some of the properties of H(s).

-- 
Timothy Murphy  
e-mail: tim@maths.tcd.ie
tel: +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2, Ireland

From owner-info-theory@net.bio.net Mon Feb 27 22:00:00 1995
Path: biosci!VOSCC.NAGAOKAUT.AC.JP!kmatsuno
From: kmatsuno@VOSCC.NAGAOKAUT.AC.JP (koichiro matsuno/7129)
Newsgroups: bionet.info-theory
Subject: Re: The Hornos paper on Symmetry Breaking and the Genetic Code
Date: 27 Feb 1995 23:59:01 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 54
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9502280802.AA04527@voscc.nagaokaut.ac.jp>
NNTP-Posting-Host: net.bio.net

In article <3itf7j$dr5@news.u.washington.edu> hillman@math.washington.edu 
(Christopher Hillman) writes:

>In the U(1) case, the
>topological constraints lead to the important Aharnov-Bohm effect.  (Right?)
>
>I guess that breaking
>a symmetry of some representation corresponds algebraicly
>to taking some sort of ``quotient'' representation of the original one.  
>(Right?)

   This is also what most physicists are saying.

>Physically, subgroups strike me as most
>natural, but algebraicly I would expect a quotient.  Or maybe I'm not even
>close.  Can anyone clarify this?

   There are at least two different schools for this type of problems. One
is to assume the significance of the presence of a primary symmetry to be
broken later. The other is to take the process of symmetry-breaking to be
primary. Most mathematicians and theoretical physicists take the symmetry
primary view and represent breaking of the initial symmetry in terms of 
the quotient representations. The problem with this scheme is its difficulty 
in envisioning the underlying dynamic process giving rise to the very 
symmetry-breaking. Only the consequence of symmetry breakings, in any, could 
be reprtesented. The procedure of taking the quotient representations is not 
actually dynamic. In contrast, the symmetry-breaking primary view, though
conceivable, has no theory of its representaiton up until now. 

   One of the tricks to cope with the dynamic symmetry-breaking witin the 
symmetry primary view may be to consider the space to be acted upon to be a 
fantacy space being different from the ordinary geometrical space. Although
the symplectic group Sp(6) can refer to 12x12 matrices of real numbers,
the space implied here may not be the ordinary geometrical space. This is 
the only way for me to grope the way toward what the Hornos have done. I 
have no idea about what that space would look like.

>And what is a Casimir
>invariant?  (I must be looking at the wrong book, namely Fulton and Harris.)

   A bilinear form of Lie group operators, that gives a definite number once
an irreducible representation is fixed (e.g., square of angular momemtum 
operators in the rotational group)

   Regards,
   Koichiro 

   Koichiro Matsuno
   Department of BioEngineering
   Nagaoka University of Technology
   Nagaoka 940-21, Japan

   kmatsuno@voscc.nagaokaut.ac.jp


From owner-info-theory@net.bio.net Tue Feb 28 22:00:00 1995
Path: biosci!adam.cc.sunysb.edu!news.nysernet.net!news.sprintlink.net!uunet!in1.uu.net!newsflash.concordia.ca!canopus.cc.umanitoba.ca!tribune.usask.ca!rover.ucs.ualberta.ca!alberta!atha!aupair.cs.athabascau.ca!burt
From: burt@aupair.cs.athabascau.ca (Burt Voorhees)
Newsgroups: bionet.info-theory
Subject: Re: The Hornos paper on Symmetry Breaking and the Genetic Code
Message-ID: <burt.794028188@aupair.cs.athabascau.ca>
Date: 1 Mar 95 03:23:08 GMT
References: <9502280941.AA08017@voscc.nagaokaut.ac.jp>
Sender: news@cs.athabascau.ca
Lines: 11

I just came across this thread, and would like to know a
reference to the paper being discussed.  Is sounds very
interesting.

Burton Voorhees
Faculty of Science
Athabasca University
Box 10,000
Athabasca, Alberta
Canada   T0G 2R0
burt@cs.athabascau.ca

From owner-info-theory@net.bio.net Tue Feb 28 22:00:00 1995
Path: biosci!bcm!cs.utexas.edu!uunet!in1.uu.net!newstf01.news.aol.com!newsbf02.news.aol.com!not-for-mail
From: sproutsrad@aol.com (SproutsRad)
Newsgroups: bionet.info-theory
Subject: Re: physical meaning of information.
Date: 1 Mar 1995 01:26:10 -0500
Organization: America Online, Inc. (1-800-827-6364)
Lines: 41
Sender: root@newsbf02.news.aol.com
Message-ID: <3j1422$7tu@newsbf02.news.aol.com>
References: <3iv9ko$a5o@cbl.umd.edu>
Reply-To: sproutsrad@aol.com (SproutsRad)
NNTP-Posting-Host: newsbf02.mail.aol.com

> an early glimpse into the hierarchical view of nature

My thanks, I now at least have sufficient information to determine my
relative position amongst the participants of this discussion.  I am
reading more carefully the prior responses, something I failed to do in my
initial rush of enthusiasm. I will also read as suggested.
While a large part of the problem may be with the receiver, it does appear
that information on information undergoes some signal loss in its
transmission to the provinces. As noted, there is a seeming contradiction
(in my mind, at least) between Weaver, speaking of Shannon's equation,
"Thus for a communication source one can say, just as he would also say it
of a thermodynamic ensemble, "This situation is highly organized, it is
not characterized by a large degree of randomness or choice--that is to
say,  the information (or the entropy) is low."";  and Douglas C.Giancoli,
speaking of the second law of thermodynamics, "When we have one hot and
one cold body, we have two classes of molecules and two pieces of
information, when they come to the same temperature there is only class
and one piece of information. ... In this sence, information is connected
to order, or low entropy."  The former suggests a relation where high
order (of a source) is associated with low information whereas the latter
suggests the opposite.  
   And then there is this quotation from Henry Quastler who wrote the
entry on -- Information theory, biological applications of-- in
McGraw-Hill Encyclopedia of Science and Technology, speaking of the
function H(X), "It is called the information content (amount, quantity of
inormation) of X and also the uncertainty of X (some prefer opposite signs
for these functions)."  It seems uncharacteristically generous of science
to allow one a choice in this matter.
Please don't feel a need to straighten out my thinking on this, it may
require a radical chiropractic (and some time to look over information
already given).
Sincere thanks,
Don Foster
Paonia, Colorado

SproutsRad




S

From owner-info-theory@net.bio.net Tue Feb 28 22:00:00 1995
Newsgroups: bionet.info-theory
Path: biosci!ns1.faseb.org!darwin.sura.net!fconvx.ncifcrf.gov!fcsparc6!toms
From: toms@fcsparc6.ncifcrf.gov (Tom Schneider)
Subject: Re: physical meaning of information.
Message-ID: <D4ryD8.Gv8@ncifcrf.gov>
Sender: usenet@ncifcrf.gov (C News)
Nntp-Posting-Host: fcsparc6.ncifcrf.gov
Organization: Frederick Cancer Research and Development Center
References: <3iv9ko$a5o@cbl.umd.edu> <3j1422$7tu@newsbf02.news.aol.com>
Date: Wed, 1 Mar 1995 18:36:44 GMT
Lines: 70

In article <3j1422$7tu@newsbf02.news.aol.com> sproutsrad@aol.com (SproutsRad,
Don Foster ) writes:

| transmission to the provinces. As noted, there is a seeming contradiction
| (in my mind, at least) between Weaver, speaking of Shannon's equation,
| "Thus for a communication source one can say, just as he would also say it
| of a thermodynamic ensemble, "This situation is highly organized, it is
| not characterized by a large degree of randomness or choice--that is to
| say,  the information (or the entropy) is low."";  and Douglas C.Giancoli,
| speaking of the second law of thermodynamics, "When we have one hot and
| one cold body, we have two classes of molecules and two pieces of
| information, when they come to the same temperature there is only class
| and one piece of information. ... In this sence, information is connected
| to order, or low entropy."  The former suggests a relation where high
| order (of a source) is associated with low information whereas the latter
| suggests the opposite.  
|    And then there is this quotation from Henry Quastler who wrote the
| entry on -- Information theory, biological applications of-- in
| McGraw-Hill Encyclopedia of Science and Technology, speaking of the
| function H(X), "It is called the information content (amount, quantity of
| inormation) of X and also the uncertainty of X (some prefer opposite signs
| for these functions)."  It seems uncharacteristically generous of science
| to allow one a choice in this matter.

If someone says that information = uncertainty = entropy, then they are
confused, or something was not stated that should have been.  Those equalities
lead to a contradiction, since entropy of a system increases as the system
becomes more disordered.  So information corresponds to disorder according to
this confusion.  In Yockey's book, he tries to get around this by brushing it
under the rug, saying that the meaning of the words comes from the math.

Always take information to be a decrease in uncertainty at the receiver and you
will get straightened out:

R = Hbefore - Hafter.

If I am sending you a bunch of characters, you are uncertain (Hbefore) as to
what I'm about to send.  After you receive a character, your uncertainty goes
down (to Hafter).  Hafter is never zero because of noise in the communication
system.  Your decrease in uncertainty is the information (R) that you gain.

Since Hbefore and Hafter are state functions, this makes R a function of
state.  It allows you to lose information (it's called forgetting).  You can
put information into a computer and then remove it in a cycle.

Many of the statements in the early literature assumed a noisess channel, so
the uncertainty after receipt is zero (Hafter=0).  This leads to to the SPECIAL
CASE where R = Hbefore.  But Hbefore is NOT "the uncertainty", it is the
uncertainty of the receiver BEFORE RECEIVING THE MESSAGE.

See my posting within the last month.  Work out the information in the bunch of
DNA binding sites I gave there.  See my primer on information theory
(ftp://ftp.ncifcrf.gov/pub/delila/primer.ps) (note: our archive computer is on
the blink and you may not be able to get it immediately).

| Please don't feel a need to straighten out my thinking on this, it may
| require a radical chiropractic (and some time to look over information
| already given).

The problem you are facing is so pervasive that it has muddled this entire
field.

I reworked this posting for the FAQ and will post that momentarily.

  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland  21702-1201
  toms@ncifcrf.gov
  http://www-lmmb.ncifcrf.gov/~toms

From owner-info-theory@net.bio.net Tue Feb 28 22:00:00 1995
Path: biosci!VOSCC.NAGAOKAUT.AC.JP!kmatsuno
From: kmatsuno@VOSCC.NAGAOKAUT.AC.JP (koichiro matsuno/7129)
Newsgroups: bionet.info-theory
Subject: Re: The Hornos paper on Symmetry Breaking and the Genetic Code
Date: 1 Mar 1995 00:33:39 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 15
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <9503010837.AA08782@voscc.nagaokaut.ac.jp>
NNTP-Posting-Host: net.bio.net

burt@aupair.cs.athabascau.ca (Burt Voorhees) writes:

>I just came across this thread, and would like to know a
>reference to the paper being discussed.  Is sounds very
>interesting.

   That is:

     Hornos, J. E. M. & Hornos, Y. M. M., 1993. Algebraic Model for the 
        Evolution of the Genetic Code. Phys. Rev. Lett. 71, 4401-4404.

Koichiro Matsuno
 



