Prion Digest V1 #6

Chris Swanson, Moderator prion at STOLAF.EDU
Sat Apr 20 12:31:20 EST 1991

Prion Digest                Sat, 20 Apr 91       Volume 1 : Issue   6 

Today's Topics:
                          Origins of Viruses

Date: Thu, 18 Apr 91 14:13:21 -0500
From: swansonc at
Subject: Administrivia
To: prion

Our mail system seems to have dropped any messages for prion,
prion-request, or prion-archive on the floor yesterday (17 and 18 Apr).
As such, I lost all of these messages.  Unless I replied to you
already, please re-send any messages.

One I remember in particular was from a news account (netnews or
something like that) from, I believe (I could be wrong here)
flashing by.  If you know of such an account that made a subscription
request this last week, please request them to re-send it with my

	Take care, 
	-Chris Swansonc (Prion Digest Moderator)


Date: Wed, 17 Apr 91 12:29:10 -0400
From: Daniel Enxing <djex at>
Subject: Origins of Viruses
To: prion at

Text item: 

          This message is intended for your new Prion & Virus List.

The following is an attached File item from cc:Mail.  It contains
eight bit information which had to be encoded to insure successful trans-
mission through various mail systems.  To decode the file use the UUDECODE

[ I uudecoded this text and found that the only 8-bit characters were	]
[ ^M's and the standard MS-DOS ^Z EOF at the end.  I removed these and	]
[ replaced the uuencode text with the resulting clear text. 		]
[ - Chris Swanson, Prion List Moderator 				]

-- Plain text follows this line --

                ARE VIRUSES RENEGADES?  
Many years ago, in an attempt to understand how a metazoan cell could 
possibly do the miraculous things that it does, I came across 
John Platt's 'book model' of the cellular machinery ('Horizons in 
Biochemistry', Academic Press, New York, N.Y. 1962) which starts 
     "The expression of genetic information in cells and 
      whole organisms is like the reading out of a complex 
      instruction manual, but the analogy extends to more 
      detail than is generally realized.  The information 
      is linearly arranged in "words" that are "read out" 
      sequentially in time.   There is one copying mechanism 
      (DNA polymerase) for reprinting the whole book, and 
      another (RNA polymerase) for selective read-out into 
      cell chemistry.  The read-out is by "paragraphs" 
      (genes) and by "pages" (operons) that can either be 
      "closed" (repressed) or "opened" (induced), according 
      to contingent "instructions" (repressor-corepressor 
      complexes) from "references" (regulator genes) on 
      earlier pages or in "books" of adjacent tissues." 
Although I admired this paper greatly, I was plain to me that, 
with respect to eukaryotic processes, the model evaded a very 
fundamental question, namely, "How are the 'pages' turned?". 
In my quest for mechanisms that could serve this purpose I was 
driven to the conclusion that cellular processes absolutely have 
to be 'real-time' processes.  One of the unexpected fringe 
benefits of a 'real-time' model was that it suggests roles for 
some of the so-called "junk code" in the genome. 
I wrote a draft paper outlining the idea, but never got around 
to having it published.  The ideas, apparently, were too outre 
at the time, because not many people ( and even fewer biologists) 
understood much about information processing -- and those who 
knew anything about real-time systems were even sparser on the 
ground.  It may well be that, as Zola said, 'there is nothing as 
powerful as an idea whose time has come' -- but in my experience 
'there is nothing as impotent as an idea whose time has not yet 
Your announcement about a network for prions and viruses started 
me thinking again, along slightly different lines. 
The exquisitely orchestrated, very precise, processes carried out 
inside the nucleus of an eukaryotic cell demand that the milieu  
be very tightly controlled.  The chances of a random bit of DNA, 
introduced somehow into the nucleus, ever managing to get itself 
inserted into the genome in such a way that it is replicated seem 
to me to be negligible. 
Yet, we know that viruses achieve this routinely.  How can this be? 
It soon polymerized on me that I already had a plausible 
answer within my grasp and that with very minor modifications (to 
my original paper, not to the model) one could derive a logically 
satisfying (to me, at least) explanation of the origin of viruses. 
I propose that the answer might be that a virus is not (and 
never was) just a random bit of DNA.  It is able to seize control 
of the cellular replicating machinery because it has (coded 
within itself) "inside information" -- derived, no doubt, from a 
renegade ancestor, in a direct line -- who was once a member 
in good standing of the organization and so could have been privy 
to the detailed information necessary to success in this venture. 
The ideas expressed here are entirely original.  Any feedback would 
be welcome.  My  network address is: 
                  <djex at LL.MIT.EDU> 
                        JUNK CODE & VIRUSES 
In higher organisms the nuclear DNA is complexed with proteins 
and some RNA, known collectively as 'chromatin'.  Not quite all 
the information in the cell is inherent in the chromatin; the 
organelles (mitochondria &c.) have to be taken into account too. 
For the purposes of this treatment, though, we shall accept 
without question the "dogma" that 'all the information needed to 
complete the organism, as well as the information that must be 
used by the developing organism to commence its interactions with 
its environment are inherent in the chromatin complex.' 
Information can only exist in a context -- it cannot exist 'in 
vacuo'.  For information to be of use, it must be retrievable if 
and when needed.  The role of a library, as a repository of 
information (an information base) can be fulfilled only if any 
given item of information in it is retrievable on demand.  If all 
the books in the Library of Congress were to be thrown 
haphazardly into a warehouse (or stored tightly packed in crates) 
it would no longer be a library and the information per se would 
effectively cease to exist. 
There is an enormous amount of information in the mamamlian 
genome.  This information is at the disposal of the cellular 
machinery.  The cell must, however be able to gain access to 
whatever specific information it needs whenever it needs it. 
Each of us comes into being from a single cell, a fertilized 
ovum.  Every time the egg divides, each daughter cell inherits a 
complete copy of the genome -- it inherits a portfolio of genes.  
Its nucleus contains all the genes which, encoded in DNA, specify 
all the different cells in the adult body.  At some specific time 
during development of the organism each cell specializes; it 
becomes a liver cell or a kidney cell or a neuron, say.  From 
that time on, all its daughter cells will be of the same kind.  
The formation of a specialized cell does not result from loss of 
genetic material; rather it follows from a change in the reading 
of the whole genome -- 'selective gene expression', as it is 
Once a cell has differentiated its metabolic behavior is also 
determined.  Even though the the code in a liver cell, for 
example, contains the 'programs' also used by a working kidney 
cell, these may normally never invoked by the cell.  Only the 
code that governs the metabolism of the liver cell can be allowed 
to be expressed without serious deleterious consequences. 
The nucleus must, therefore, embody specific regulatory 
mechanisms capable of activating and deactivating particular 
regions of the genome for RNA translation and protein synthesis, 
depending on the instantaneous state of the cell.  The emergence 
of the cellular machinery conferring the ability to express, 
selectively, different regions of the genome (i.e. different 
code) is what enabled metazoans to arise and evolve. 
Some knowledge of regulatory functions in prokaryotes has been 
gleaned (e.g the lac operon) but the mechanisms by which the 
selection of genetic potential in the eukaryotic cell is 
accomplished is still largely unknown and represents one of the 
most challenging problems in modern biology.  In 1971, in an 
editorial in 'Nature', it was declared that "the structure of the 
eukaryotic chromosome is the vital issue that must be resolved 
before research today in cell biology can produce a coherent set 
of concepts instead of a mass of unrelated data." Almost 20 years 
later, as far as I can tell, the problem is still largely 
It is my contention that prions and viruses represent part of the 
regulatory machinery that 'escaped' and mutated.  If this 
conjecture were to be established as fact, it offers the 
possibility that prion- and virus-like artifacts might be used as 
'probes' to elucidate the cellular machinery and give us greater 
insight into their depredations within the cell. 
It is generally accepted that information is encoded in the 
sequences of nucleotides that constitute the DNA in an eukaryotic 
cell.  The details of the triplet code are now well known and the 
process of transcription, during which the encoded information is 
precisely translated into complementary strands of RNA that 
direct the synthesis of specific proteins is well understood. 
The codes for proteins constitute only a part of the genome.  One 
of the most awkward facts to account for when analyzing the heredity 
of higher organisms is their great excess of DNA; the amount varies 
with the species, of course, but there always seems to be far more 
in the genetic material than can be accounted for by the sum of the 
the codons needed for  proteins production. 
Some stretches of the 'redundant' code are thought to be regulators 
which govern the production of protein (analogous to operons in 
prokaryotes).  In additon there is a large amount of repetitive 
code which seems to serve no apparent purpose.  Some biologists 
refer to this component as "junk code." 
In attacking the problem of regulation, the first question is one 
of strategy: how should one attempt to resolve the issue? 
Since information is the currency of genetic trransactions, it 
seems natural to try to consider the problem from an information- 
processing point of view. 
Nature (if I may be permitted an anthropomorphic metaphor) is a 
tinkerer, not a designer.  The structures that we uncover are 
"Rube Goldberg" contraptions -- superbly engineered and optimised 
through the agency of natural selection, but kludges 
nevertheless.  Experience with computer systems, which are orders 
of magnitude simpler, show that it is supremely difficult to 
fathom the logic behind such 'ad hoc' constructions "from the 
bottom up".  Is there perhaps a different approach with greater 
Starting with some rather basic assumptions, a case will be made 
for selecting a particular information-processing structure.  A 
model of this structure will be described and some consequences 
will be drawn from the given model.  Finally, it will be shown 
that the cell supports processes similar to those required by the 
     (1)  The physical information structure which resides in the  
          genomic DNA is LINEAR (or, at most, closed in the form   
          of a ring); IT IS NOT BRANCHED OR STRUCTURED IN ANY      
          OTHER WAY. 
     (2)  Processing of the information in the DNA does not start  
          at some (global) 'beginning' and proceed sequentially    
          (endlessly) from there on.  That is to say, even though  
          a particular stretch of code is expressible (locally)    
          sequentially, for a specific protein, the transcription 
          site for the next product need not necessarily be 
          adjacent to it. 
     (3)  Nor is it random. 
     (4)  The program embodied in the genome must be responsive    
          to patterns of input 'signals' from three levels: 
             o   intranuclear 
             o   intracellular 
             o   intercellular 
These assumptions imply that the logic of the process is BRANCHED 
even though the code for the process is LINEAR. 
An example of a real-time reaction might be the response of a 
cell to adrenalin. 
If we accept Herbert Simon's argument that biological systems 
have to be hierarchical because there has not been enough time 
for any other kind of system to evolve, then 'execution' of the 
'program' embodied in the chromatin could be represented by a 
branching structure or hierarchical 'tree'. 
Such an executive system would be capable of mapping a linearly- 
ordered physical information structure into a logically-ordered 
branched 'time-series' of processes.  To do this, it must be able 
to 'address' specific segments of the program (code) as needed. 
For this hypothesis to be viable, it is necessary to show that 
there exist mechanisms capable of accomplishing this feat. 
Because of the addressing structure (the memory organization) and 
the sequential nature of programs in a digital computer, the 
executive can invoke a specific process at will by 'pointing' to 
it -- transferring control to it by reading the appropriate 
address into the program counter.  The genome does not appear to 
have any such addressing structure;  it is, so to speak, 
'diffuse'.  One way of achieving the desired effect in a diffuse 
structure would be to seal off all code except that for the 
process called for at the moment. 
This seems to be the method that actually evolved.   The coiling 
and supercoiling of the paired DNA strands are the means whereby 
only certain sites are allowed to be active at any time.  Code 
that is not meant to be expressed at that time is 'hidden' 
within the coils -- only code meant to be expressible in the 
given context is exposed by the local uncoiling of the DNA 
strands.  Such an arrangement would call for 'filler', to keep 
unexpressible code far enough away to be inaccessible, and this 
filler may be an important component of the so-called 'junk' 
It is immediately obvious that a liver cell, for example, could 
become specialized by sealing off forever (by phosphorylation?) 
all code not specific to the metabolism and replication of liver 
cells.  The executive system could orchestrate cell activity by 
opening and closing sections of code selectively.  An agent that 
interfered with the seals and allowed 'outlaw' code to be 
expressed during replication, might cause tumors or cancerous 
cells to develop. 
Because of the real-time nature of the system, the executive 
process itself always has to be 'resident' (i.e. available) to 
avoid the condition that programmers refer to as the 'deadly 
embrace'.  To illustrate this condition, consider the analogous 
problem of executing a very large program in a computer with a 
disk drive but very limited random access memory.  Each 
successive program segment has to be read in as needed.  The disk 
driver -- that process which actually causes the data to be read 
from the disk into memory -- always has to be resident in memory.  
If it were to be inadvertently swapped out to disk, a 'deadly 
embrace' would result because now there would be no way to read 
in the next segment.  By the same token, the 'code' for processes 
that when expressed, cause the currently-open segment of the genome 
to wind up and which open the next appropriate segment, always has 
to be available when needed.  Operationally, this means that 
every lowest common denominator of open code has to contain its 
own copy of the executive process, so there will have to be a 
multiplicity of copies of the executive system distributed 
throughout the system. 
A 'bug' in the program, (a mutation, that is), in the sequence 
coding for a protein may or may not be lethal.  If it is not lethal, 
it might be neutral or even beneficial -- or it might have delayed 
effects, causing complications later (e.g. sickle-cell anemia). 
Executive systems, (judging from experience with computer 
systems) are far less tolerant of aberrations.  An error in the 
executive process is most likely to be lethal; so one could 
expect the code to be highly conserative.  This means that the 
multiple copies of the executive system are likely to be very 
similar, providing another source of repetitive non-protein- 
forming code. 
The executive process itself, furthermore, may not be monolithic, 
but may itself need to be distributed.  This probably would 
entail a good deal of filler, adding to the non-functional 
Five (of eight) histones have been isolated and sequenced from 
a wide spectrum of eukaryotic species, suggesting that it 'froze 
over' very early in the history of eukaryotic organisms.   One might 
expect that the executive 'machinery' -- playing, as it does, 
such a fundamental role in cellular function -- would have had to 
have come into being equally early on.  Indeed, the histones, 
needed for coiling the DNA are very much part and parcel of the 
same regulatory processes. It follows that the regulatory machinery 
would be equally widespread and at least as conservative.  A search 
for such invariant processes of repetitive DNA would pinpoint the 
sections of the genome that represent the executive system and 
isolate them for further study. 
Segregation  of function in membrane-limited nuclei, 
mitochondria and plastids is another hall-mark of eukaryotic 
organisms.  The separation of genes for complex organellar  
elements may be a general principle of organelle and eukaryotic 
biology.  In the interests of efficiency, probably, these 
organelles (e.g. mitochondria) have had some of their functions 
(and the associated code) taken over by processes in the nucleus.  
Mitochondria, for example, no longer make their own membranes.  
Why, then, have they retained any code at all?  A plausible 
answer is given by this model. 
An organelle may have to carry out some functions that cannot be 
subservient to the current process in the nucleus.  That is to 
say, it has to carry out its process irrespective of what is 
happening in the nucleus (respiration, for example) at the time. 
By executing its own code, independently,  it becomes an 
'asynchronous' (satellite) processor, performing its appointed 
function irrespective of the instantaneous state of contemporary 
nuclear processes. 
It seems likely that the logical tree in the DNA has three major 
branches, each one controlling a specific function: 
     o   development 
     o   replication 
     o   metabolism 
The Principle of Parsimony suggests that the developmental and 
the replicative processes might share some commom code. 
The executive program is more than just a switching network -- it 
is a dynamic process.  It contains information it uses (and 
modifies) to determine its pathways depending on the 
instantaneous context. 
Certain predictions can be made from this model: 
     o    The chromatin will contain a large amount of repetitive  
          code, some of which( filler code) may seem non- 
     o    Some of this repetitive code is functionally equivalent 
          to an executive process control system 
               -    which is highly conservative, and therefore, 
                    may seem 'primitive' 
               -    will be found (modulo minor variants) across 
                    a very wide spectrum (if not all) eukaryotic 
     o    some of the non-histone chromosomal proteins are not 
          for export, but are generated solely for control 
          purposes within the nucleus 
          - A DNA sequence which generated such a protein, if it 
          escaped, might be the precursor of a virus. 
     o    some intranuclear RNA may play a similar role.  The 
          role of inverse transcriptase is to make this possible. 
          - A RNA sequence which generated such a protein, if it 
          escaped, might be the precursor of a retrovirus. 
This model allows for evolutionary change in anatomy and way of 
life to be based on changes in the information controlling the 
expression of genes as well as point mutations in protein- 
producing genes.  So it is possible for species such as humans 
and chimpanzees to differ so substantially in anatomic detail and 
way of life and yet have proteins that are 99% similar. 
                                             Max ben-Aaron. 
                                 in care of <DJEX at LL.MIT.EDU>


The "Prion Digest" is a Usenet distributed e-mail list, compiled from
postings to it, and distributed weekly (current plan is for early Sat.

While the main goal of the digest is to provide a resource for
researchers working with prions and interested bystanders, all are
welcome.  All articles posted will be included in the next digest.  If
a poster feels that his posting is of an urgent nature, it may be
distributed sooner than the regular digest.  If you want to post an
"urgent" message send it to the prion-request address, not the prion

All requests regarding administrivia (subscriptions, cancellations,
comments, etc.) should be mailed to the moderator 
<prion-request at>.  All postings to the digest should be
directed to <prion at>.

There are archives of all back issues available via anonymous ftp from
beowulf at ( in the pub/prion directory.  If
you do not have ftp access, please write <prion-archive at>
and back issues will be mailed to you.

	-- Chris Swanson (Prion Digest Moderator)  <swansonc at>


End of Prion Digest

More information about the Proteins mailing list