How the Brain Works and More

Paul Bush paul at
Sat Apr 20 19:06:21 EST 1996

Here is an essay that
describes how I have come to understand reality. First I shall
explain the brain, to show that the theory is grounded in reality. I
will then go on to show how this explanation leads us to an objective
framework for knowledge representation. This theory has interesting
philosophical implications.

OK, lets begin. How does the brain work?

The details of the answer are staring us in the face. To paraphrase
Carver Mead, space and time are there own (abstracted) representation.

To understand how the brain works we must first understand the
question. What does the brain do? The answer, as we all know, is that it
moves genes forward in time. That is, it is a machine sitting inside a
machine, directing its actions in order to ensure survival. Viewed
from this perspective, we see that anything that happens to the brain
that it can use to complete its function will be used. This is the
fundamental principle of brain operation.

The following description will get very technical (high level of
detail). Only those with some neuroscience knowledge will fully
understand it, though hopefully the main point will be clear to all.

Lets start with the cortex. The cortex (+ some other structures such as
the basal ganglia and thalamus) is responsible for (abstract) thought, 
consciousness and other intelligent mental processes. The function of
the neocortex is to construct a model of reality. A literal physical
model. The cortex literally models the world in information space. One
neuron represents one quantal idea. Wild, eh?

How does this work? The cortex is a modular structure, with a basic
circuit that is replicated many times. We must thank Kevan Martin and
Rodney Douglas for this idea. We already know what this basic circuit is,
albeit at a very low level of detail. This circuit is seen, with some
minor variations, in all mammals from rats to humans, and in all
cortical areas from primary sensory cortex to primary motor cortex via
frontal 'association' cortex. The same classes of component cells are,
as far as we presently know, also seen across species and modalities. 
This incredible ubiquity means that the basic function of cortex is
independent of the demands of any particular animal, and independent
of the nature of the data that it is processing: The basic cortical
circuit that 'processes information' in the sensory cortex of a cat when
its leg is touched is the same circuit that 'processes information' in
the frontal cortex of a human when it thinks about calculus. This
means that the function that the cortical circuit performs must be very
general indeed: Nature has found a brain circuit that can
accept information about anything, abstract the relevant regularites and
invariances and incorporate them into its structure, then use the resulting
structure to very quickly (perhaps purely feedforward) provide 
relevant/correct output for any input.

As we all know, the cortex abstracts features. (feature - a prominent
part or characteristic). Patterns in space and time. Information. In
the visual system these features are simple characteristics of the
visual world. The actual features abstracted do not matter (cf hidden
units in ANNs). All that matters is that something regularly appears
in the real world that is captured (modeled) by the cortex. The future
activation of that feature means (to the owner of the brain) that that
feature is present again in the real world. The significance of the
feature in terms of the goals of the brain is not determined by the
cortex. The cortex just indicates that this feature is (believed to
be) present. The cortex is the ultimate rationality engine.

In higher cortical areas these features are called concepts (concept -
a collection of features). The relationships of concepts and ideas to
one another are literally the synaptic connections between neurons in
higher cortical areas.

How does this work? From the above, we can picture an almost homogeneous
sheet of cortex over the surface of the brain. It is a sheet of
replicated modules, each composed of many individual neurons. The
structure of the module gives us clues to its function. The input
comes from a layer lower down in the space of complexity. This input
is information about the real world. It comes in the form of binary
spikes distributed over a number of fibers (axons). Each spike in each
axon signals the presence of a specific feature detected in the layer
below. The patterns (in space and time) in these spikes are the
features to be abstracted by the module in
question. The neurons in one module compete for features, I think as
predicted by G. Edelman (though I have not read his work on the
subject). Each neuron has thousands of synapses on it, though only a
few tens are needed to cause it to fire. The few lucky synapses are
selected by a process of correlation, Hebbian learning: If the neuron
fires it is because it has received a combination of inputs
(representing some feature pattern) from a lower layer. The
correlatation is between those inputs and the target neuron. The
synapses from these inputs to the target neuron are now increased -
the neuron now represents that pattern of inputs. If it fires again it
is a sign that that pattern is present again. For competition to occur
there must be something that decreases other synaptic weights. Some
normalization process. This could be a cloud of nitric oxide released
by the target neuron (as studied by Read Montague) or just a continual
decay of all synaptic weights over time. I favor the latter hypothesis.

There are also connections between neurons within a module. These
connections also aid in model building. Through the same process of
Hebbian learning, neurons within a layer strengthen their connections
with neurons that fire with them. Neurons that represent similar
features. Thus all neurons in a module respond to similar features,
but each responds to a slightly different feature than its
neighbors. These neurons in turn compete to be selected by a neuron in
a higher layer in complexity space. Long-range horizontal cortical
connections extend the volume of information space analysed by a
particlular layer.

Neurons with similar selectivity cluster together to compete for the
set of features being delivered to their approximate position. This
explains maps, areas, columns and blobs. In fact the theory predicts
that any concept contained in the brain/mind is instantiated as at least
one neuron in a cluster. (It seems Dennet is right, though I haven't
read his books on the subject. However, the reason I got into this
subject is cos I read 'The Mind's I, so I guess he contributed to the
theory). The more complex (more features) the concept, the more
neurons there are that represent it.

This feature modeling happens all over the cortex (sensory, motor,
frontal etc) - the cortex simply abstracts
whatever features are useful in behavior, whatever the modality. All
the time. The cortical model is continually updated. The work of Mike
Merzenich and others has shown that repeated stimulation induces
physical changes in the cortex - the area of cortex representing the
stimulus expands. It competes with surrounding areas and wins due to
the increased input. More neurons are assigned to the stimulus. Thus
we can conceive of the cortex as a continually shifting model of the
world, a map of reality constantly morphing over the surface of the
brain on a scale of mm over the lifetime of an individual and cm over
the lifetime of a species. Anything important in the world is mapped.
We can also see that a higher level neuron has access to input from
the whole feature space at lower levels (visual RFs increase in size
from V1 up), as each neuron in a layer selects from hundreds or
thousands of inputs from the layer below. D. Van Essen and others have
called this the 'shifter hypothesis' (though again I haven't read the
papers on it).

So what does it do with this model? It predicts the future. The
connections within a layer describe realtionships between features in time
as well as space. Thus an input feature pattern activates some neurons
in a layer. These neurons activate others in the same layer (as well
as higher layers), which then activate others etc etc. M. Abeles has
described such a process as synfire chains. Each neuronal firing
represents the brain's belief/prediction that its associated feature
is present in the real world, because that is what happened
before. This is the basis of associative memory - each concept/neuron
activated causes the activation of a nearby (in feature and physical
space) concept/neuron. The exact time differences in firing over long
time periods recorded by Abeles reflect exact relationships between
features in the real world that the neurons represent.

While we are awake, the cortex is continually receiving input, which is
filtered up to the highest layers. These layers contain our most
complex representations - the highest level at which we understand the
world. They are the basis for our behavior. These layers, probably
through interactions with other brain regions (esp basal ganglia),
make decisions. They determine our perception of reality. They
also determine our future perception of reality.

How does this work? Besides input connections and within layer
connections, there are also feedback connections. Each cortical layer
projects back to the layer it receives input from. If a high level
neuron is active, it activates the area of feature space that usually 
activates it - it is making a prediction about what features are/will
be present in the real world. Feedback connections are generally
diffuse - they enter cortical layer 1 and synapase over a wide area. This is
because the higher level neuron only activates the general area of
feature space that it receives its inputs from - the prediction is not
completely precise. This feedback information is received by
wide-spread apical dendrites of neurons in cortical layers 2/3 and
5. Apical dendritic synapses are far 
from the cell body. The apical inputs are summed up and 'fired' down
the apical dendrite by active conductances. Thus the prediction is not
precise in time as well as space. In contrast, synapses within and
into layers are made precisely, close to the cell body. The cell can
tell which particular features caused it to fire. Thus feedforward 
information (from the real world) is more precise than feedback
(brain's internal prediction). This makes intuitive sense, I
think. For any active region of cortical physical/feature space, 
precise input from the real world is superimposed on a background of
prediction. The repeated presentation of the same input pattern will
change the synaptic structure to reflect its component features. Note
that any area receiving input without concurrent feedback will have a
hard time activating cells and being learned. The input will have to
be strong and repeated. This means that the brain
has a hard time accepting things it does not believe in.

We can now see how memory works: Synaptic connections are your
memories - connections between concepts. They decay over time (short
term memory), but are built up with use. Each night when you sleep,
all synaptic connections that have received heavy use are strengthened
(consolidation). This is why dreams contain concepts that you have
been thinking about a lot during the day - sleep provides random
'test' inputs to cortex (PGO spikes?) which activate these heavily
used concepts (strong synapses). If connections are strenghtened
enough, they become (semi) permanent (long term memory).

Dynamics - if you want you can skip the next two paragraphs. They are only
speculations about data. I think many aspects of the cortex are not 
directly related to its function - instead they deal with the physical
realties of the brain's hardware limitations. Thus, I think inhibitory
neurons provide gain control (also reverse sign of a cortical
connection - to signal anticorrelation), layer 4 performs
some kind of transformation of the input to put it into the right
dynamic range, layer 6 is the 'gatekeeper' to each module (3 component
loop - layer 6, layer 4, lower input module). Layers 2/3 and 5 are the
'important' layers, their synapses are the memories. I think real
world input and top-down prediction are combined in the upper layers,
then the module's 'simulation' percolates vertically down to layer 5,
where it is output. Layer 2/3 projects to higer areas 'before the
simulation is complete' to quickly give higher layers a
prediction/input. Layer 5 has fully completed the simulation and
outputs the result to a lower brain center to produce action.

Why does each cortical area have an output, instead of waiting till
the final picture is established? You can see by now that
cortical areas are arranged in progressive order of complexity
(expertise). We go from small lines in V1 to faces in IT. However, the features
extracted at each stage can still be useful to help the rest of the
brain generate behavior. Thus, macaque monkey's have huge V1's
relative to the rest of their cortex. It should be clear by now that
increasing the size of an area at any one level increases the
resolution of analysis at that level. Macaque vision is very high
resolution, thanks to the size of V1. V1 outputs to the superior
colliculus, which controls eye movements. Perhaps V1 is large to
provide very high resolution eye movements, to foveate stimuli -
spatial resolution in the retina is much lower even very small
distances from the fovea, so exact foveation is important. If this is
true than V1 is just telling the SC that a stimulus is present at a
particular location, with high resolution.

So what do we have? Cortex is a reality model builder - it just
constructs a space/time model of the world, nothing else. So where
does behavior come from. Motivations, drive? The answer, of course, is
from the rest of the brain. Lower brain structures send
inputs to cortex that direct behavior. For example, the mesolimbic
dopamine system is a tentative candidate for the pleasure/reward
system. Simply put, if a behavior performed by the cortex benefits the
animal (stimulates the reward center), this system fires and
strengthens the cortical synapses that resulted in the behavior. Add a
pain system that penalizes incorrect behavior (pain centers are spread
throughout the brainstem - negative feedback is vital for survival)
and you have an organism that can learn to advance its position in the
world. But for what reason? The hypothalamus contains neurons
sensitive to blood sugar levels, osmotic balance, sex hormone levels
and many other drives that induce behavior. All these systems coupled
together produce an organism that is able and motivated to interact
with the real world. There is only one thing missing:
Consciousness. For the answer to this we must turn to Francis
Crick. You have to admire Francis Crick. He solved the genetic code
nearly 50 years ago and now, through massive determination and
intelligence,  he has solved consciousness. The seat of consciousness
is the limbic association cortex (cingulate gyrus). It is here that
the lower brain centers send input when they are activated. The
pattern of activation of these inputs in the cingulate cortex is
modeled just as any inputs in any cortical area are modeled. Except
the resulting model here is called sensation. For example, the
amygdala sends projections to all the brainstem centers that directly
cause fear behavior (sweating, increased heart rate etc) as well as to
the cingulate cortex. Activation of the amygdala causes the emotion known as
fear. The action of the amygdala's outputs in the cingulate gyrus
causes the sensation of fear. Lesion of the amygdala removes fear -
such animals cannot be frightened. Lesions of the limbic (cingulate)
cortex causes loss of conscious perception - such animals still live
and function, but they have no emotions. They have no affect.


Well, that's the brain in a nutshell. So what implications does this
theory have? I think it tells us something about the structure of
knowledge, and thus the structure of reality. Consider the cortex
constructing a high level model of the world - the one we call our
belief structure. Each belief is a collection of neurons very high up
in complexity space. The synaptic connections between the neurons
directly represent the real world relationships between the concepts
that make up the belief.

Someone's cognitive model of the world changes when they perceive and 
understand something new. What does it mean to understand something new?

Understanding a new concept fundamentally involves (is defined by) relating
that concept to concepts that already exist in the subject's cognitive
model of the world (I use the term 'subjective world' (sW)). The
structure of someone's sW reflects all their beliefs about
reality. The concepts in our sW's are organized: Concepts that
represent basic features of the world are 'deeper' than concepts that
represent complex features because the latter are defined in terms of
the former. One literally understands things in terms of the concepts
that _stand_ _under_ them.
There can be relationships between concepts more than one level apart - some
properties of very high level concepts can be explained by low level concepts.
Relationships between concepts at deeper levels reflect very fundamental facts 
about the nature of the world. Thus the best explanation of any concept is to 
identify its relationship to the lowest possible level in the sW. The better
the explanation, the more properties of the concept are related to some
parts of the sW. To be a full explanation, all properties of the concept 
must be so related. By definition, this concept would be fully understood.
Conversely, an explanation is bad if it can only identify a few relationships
between the concept and the sW. If too many properties of the concept have 
no relation to any part of the sW, that concept is unexplained. It is 

What does this mean in neural terms? Concepts are neurons,
literally. The defining relationships between concepts are synaptic
connections between these neurons. Conceptual layers are
different cortical areas - more basic sets of concepts define
higher level concepts; activity patterns in neurons in lower (in
complexity space) levels cause specific neurons to be activated at
higher levels. Understanding something new is a change in your belief 
structure - a change in the synaptic connections between neurons
representing the concepts at issue. The longer a connection has been
in existance, the harder it is to change. This is partly because new
levels are always being built on top of old ones. In the process of
conceptual learning we use newly aquired concepts to later define even
newer ones. Because the activity patterns are very complex, their
structure cannot be changed arbitrarily, especially at very deep levels. 

Neural equivalent of 'explanation': When trying to learn a new
concept, or change the structure of an old concept,
(change activity patterns to activiate new/different neurons),
concepts close in conceptual space are presented together with the new
concept (similar activation patterns are concurrently presented). If
the new concept 'fits' with the similar concepts (better than the old
one did) then it is believed (if the new activity pattern and similar activity
patterns reinforce each other (better than the old one did) then the
synaptic weights between its constituent neurons will be strengthened
(at the expense of the old one's)). If no similar patterns exist, or
if the new pattern is too different, then the new pattern cannot be learned. 

We can also see that the most efficient way to learn something new 
(whatever it is) is to attempt something slightly harder than what your
present ability allows. Trying something too hard will not help you
for the same reason that you cannot perceive totally new events - you
have no relevant brain representation to modify or extend.

Note that neurally, understanding and perception are the same
process. If a new pattern is too different, it cannot be learned. This
means that if a stimulus is presented that is very different from
anything encountered before, then it cannot be perceived. Remember
that a part of this is top-down feedback, described above, which is
continually imposing its prediction on all lower layers,
biasing them to fire as they have always done. This contributes
to the inablitiy to perceive and understand genuinely novel events.

Thus there are serious constraints on our perception and understanding
of reality. In addition to the above, the motivational systems of the
lower brain (described above) strongly influence what is perceived and
remembered, though the process of attention (thalamus). We only can only 
perceive/understand things similar to what we already know and things
that are important to us. We don't learn each occurence of a similar
event, instead after the initial event we just learn the differences
to create a prototype (Very similar neural activity patterns reinforce
common input synapses (features) to an asymptote).

So how does the cortical world model get built up over the lifetime of
an individual? When a mammal (though I will focus on humans) is born the cortex
starts as a blank slate. However, its not really
blank due to some genetically ordained + in utero organization. This
organization 'bootstraps' the cortex. It reflects the most basic
features about the world, eg the fact that the elements of visual space are
locally correlated (lines and surfaces exist), rather than
uncorrelated (like the white noise of an untuned tv set). Apart from this,
the new cortex has had no experience of the world, therefore it has
extracted no features of the world, therefore the neonate has no
conception of the world. This means, that the neonate cannot
understand or perceive anything. Here is where the bootstrapping comes
in - the only things that the neonate cortex can understand and
remember are experiences that build on the initial (prenatally
created) representations. That means very basic features of the world,
like the facts that discrete physical objects exist, sounds can be
correlated with these objects, and the fact that the neonate can
physically interact with these objects. The neonate cannot understand
or perceive anything beyond these concepts. Thus it is impossible for
an adult to remember experiences of neonatal life - concepts/features that
would seem meaningful to us are simply not perceived by the neonate.

In the early stages of cortical model building, no
'direction' is necessary. The sW being constructed is at such
a basic level that all featues of the input extracted and perceived
are important enough to be incorporated. During this phase the infant
exhibits behavior designed to speed up the construction of the sW - it
produces lots of random activity (motor, verbal etc) which generates
sensory data about the nature of 1) sound, vision etc, 2) the infants
relationship with the world. Infants are relatively hyperactive -
the faster they generate and take in data, the faster the development
of their world model. As infants develop into children, and the basics
are all established, this drive develops into curiosity. Curiosity is
the drive to actively extend the boundaries of the sW in knowledge
space. Learning is the process of modifying and
extending the existing sW in the following way: New concepts are constantly
encountered. Those that can be interpreted in terms of the existing sW
are perceived and remembered. Experiences can only be perceived and
remembered at the highest level of the sW that they can be interpreted in
terms of. For example:

The scene - a cricket pavillion containing an infant, a child of a few
years and an experienced adult cricketer. The door flies open,
everyone's head turns to the door and another cricketer runs in and shouts
"One of our openers caught a nasty bouncer and needs a runner!"

The infant perceives an intense flood of light (as the door opens) a
series of noises, and a large noisy white shape that it may or may not
recognize as a person. The intense stimuli cause it to start crying.

The child perceives a man dressed all in white run into the room and
shout something. It seems to the child that he doesn't quite hear what
the man says, but he understands something bad has happened in the
game. Uninterested, he turns back to his toy.

The cricketer perceives that his captain needs him on the field and
runs outside after him.

None of them perceive or remember any concepts beyond their sW. In
addition the child and the man do not perceive the concepts of the
scene that are at a relatively low level in their sW. To them, these concepts
are boring. In general, concepts are boring if they are at too low or
too high a level. In the case of the former the subject derives no
information from the concept because it has already been encountered
and understood in all its many forms. In the case of the latter the
subject derives no information from the concept because it cannot even
be perceived - the relevant sW structure does not yet
exist. Concepts are interesting when they are at or just slightly
beyond the boundaries of the sW - this is where learning occurs,
because this is where information is available.

As an aside - concepts at a low level in the sW can be of interest
if they connect together deep, previously unconnected parts of the sW. These
events can be unexpected and often result in laughter. The basis of
most, if not all, humor is the creation of connections between
previously unconnected concepts deep in the sW.

In infancy, learning occurs by simply experiencing
the world - the cortex extracts features of the input that are at the
current highest level of its sW. From infancy through death the
cortex continues this process - we learn about anything that we spend
a long time experiencing. This is equivalent to saying that we become
experts by repeated practice. It is very hard to think of a process
that humans do not become better at through constant repetition (by
'better' I mean faster and more efficient). This is the secret of the
brain, and in fact the secret of human success - humans can get better
at anything through simply doing it repeatedly. There are constraints
though. As I said above, to learn one must attempt something only
slightly more difficult than present ability allows. This is because
it its only possible to advance by modifying and adding to the
existing sW. This applies whether in the sensory, motor or purely
abstract domain. This applies both while progressively attempting harder
and harder tasks and while improving performance on the same
task. However, now it becomes apparent that simply experiencing is not
enough. The cortex can keep abstracting features about the task at the
highest level of the sW, but some kind of selection criterion is needed.
Because the modification of the sW is not happening at a basic level
anymore, not dealing with the most fundamental properties of the
world, not all the new information is useful and needs to be
incorporated into the sW. The selection criterion used by the cortex
is success/faliure - if the new information extracted produces behavior that
significantly improves or worsens performance on the task, that
information is incorporated into the sW. The corresponding behavior is
reinforced or suppressed. This criterion is useful because it is
usually immediately apparent what impact the new information has on
performance. In addition, a single reward system that distributes it
effects throughout the brain can be used to enforce the selection
process. A good candidate for this reward system is the mesolimbic
dopamine system (see above).

As we get older our sW becomes more extensive. The extent of the sW
representing an area of knowledge about the real world is wisdom about
that subject. This develops through experience, influenced by
intelligence. Intelligence is the making of connections between
previously unconnected entities in the sW, in order to make the
structure of sW more similar to the real world (W) (Actually
the definition of intelligence is more complex that this). So as well
as influencing the development of the sW at the highest levels, it can
also be used lower down in the sW. Human's, in particlular, can
productively spend time looking for connections between concepts deep
inside their sW (meditation). In
fact the lower down in the sW the concepts are that are connected
through intelligent thought, the more useful the connection (literally
a profound thought). However, this type of thought is not easy. It is
hard to change the deeper levels of the sW because so many higher
concepts depend on them. In the same way (and for the same reason)
that evolution operates by making changes late in fetal
development, the brain operates by making changes high up in the
sW. Thus as we get older it becomes harder and harder to change our
fundamental conceptions of the world. This may be accompanied by
physiological changes that make it physically harder for the cortex to
change as we get older. In addition, this explains the tendency people
have to stick to a belief in the first plausible explanation they hear
regarding some event, even when later evidence calls it into question:
The first explanation is incorporated into the sW. As time goes on,
other facts are incorporated in a complimentary manner (the sW influences
perception, remember) and the erroneous concept gets buried deeper in
the sW. So when the erroneous concept is challanged by new data, all
relevant recent experience acts to reinforce it. Replacement with a
correct concept becomes harder as time goes on.

Person A explaining something to person B is the act of making
B's sW more similar to A's: A expresses the nature/structure of a part of his 
sW by voicing the symbols that are activated by that part. If
a homologous part of B's sW exists (has the same structure and is connected to
the same concepts at lower layers), these symbols will activate it in the same
way. The more similar A's and B's sW's are, the easier and more successful the
communication will be. Paradoxically, communication of ideas works
best when two people's sW's are identical. When we most need it, communication
works the worst.

This theory implies that abstract words can now be related to
quantifiable real world properties. Many have already been defined in this
way above. Some more examples: Wisdom is the area of cortex (at all
levels) devoted to particluar concept. Fact is W. Fiction is any part
of sW that is not identical to the corresponding part in
W. A superstition is the result of too few neurons being used to
represent a concept - erroneous prediction occurs. Elemental thought =
elemental percept = single neuron. Gut feeling/intution is the
cortical prediction of the future.

An insane (incorrect) thought is a incorrect cortical prediction of
W. Perhaps the biggest source of human anxiety is the fact that we can
never be 100% sure that our sW = W, we can only test it with reality
(even then we can never know whether our Universe's highest level of W
is a fiction of some greater W - we cannot deny the possiblity of
anything). Thus insanity is defined as a high level sW fiction that
has taken control of the cortex. This is a vicious circle - the more
the fiction grows, the easier it is to accomodate new data into it.

All humans live in the same physical world and have the same basic physical
structure. This means that their sW's have many parts in common at very deep 
levels - these parts are identical, thus they can fundamentally
understand each other. All humans' basic definition of meaning
(lowest level of sW) is the same. However, people have slightly different
genetics and different experiences. This means that two adults have
sW's that are different at higher levels - they have different
conceptions of reality. The more similar the genetics and more
importantly the experiences of the two people, the more similar their
WMs. We can now see that the question 'What is it like to be a bat?'
literally has no meaning.

I will explore the possibly disturbing consequences of this theory in
the next post. Since I am a neuroscientist, I will also talk more
about the brain: Oscillations are literally the threshold of
(subjective) reality! (Yay Charlie!). I will sketch the function of
the basal ganglia.

One last point - to build a nonhuman intelligence, build a huge
cortex and connect it to some inputs. The difficulty will come in
training it - deciding which higher level concepts it builds are
important. This is dangerous. By definition we will not be able to
understand any high level thoughts of an entity with more complex cortex
than us. This means we will not be able to predict its
actions. (Sounds like bad science fiction. But instead its fact. W.)


Copyright Paul Bush 1996 all rights reserved etc etc.

More information about the Neur-sci mailing list