A Neural Theory of Mind

Paul Bush paul at phy.ucsf.edu
Mon Apr 22 16:21:23 EST 1996


I have extensively revised my theory of brain operation. Now all
mental activity is explained within one framework. The previous post
presented a theory with many holes and some innaccuracies. This
version, although containing more specific mechanistic information
about an overall operational principle, still contains some holes and
no doubt some innaccuracies. However, it is now a much stronger theory.

The dimensions of the real world that are relevant to us are space and time.
Since we exist in the real world, can interact with it and have goals
(remember the genes), we abstract information from the space/time
variations of the real world to acheive those goals.

At the lowest level of abstraction/analysis, our information input
rate is limited by the spatial and temporal sampling rate of our
sensory transducers. We sample at high rates to gather maximum
information. This generates a large amount of data, only a small
fraction of which is relevant in any one situation. The relevant
information must be abstracted. This process is performed by the
cortex. With this in mind (literally), let us consider how the brain
performs its function and in doing so generates our mental world:

Consider visual information entering the primary visual cortex from
the LGN. Space/time patterns of spikes enter layer 4 of the cortex. 
Useful information is abstracted in time and space - layer 4
cells fire slower and their RFs are bigger than their LGN
inputs. The input enters as very low level features, so contains much
irrelevant information that must be filtered out. Layer 4 projects to
layer 2/3, where more complex features are abstracted. Because the
sensory input contains relatively more information in the
spatial domain, temporal resolution is sacrificed to enhance spatial
resolution: Cells signalling similar features synchronize their
oscillations, with cells further away in space (as recorded by Gray
and Singer). Extra information about spatial correlations in the input
can thus be sent onto the next level. At this low level the time
constants of information are large, thus synaptic weight change is hard to
induce - someone's mind cannot be easily changed at a low level. As the
signals move higher up in complexity space, more information is extracted and
firing rates decrease. In addition, as the concepts extracted span an
increasing distance in space/time, the dimension of time begins to
be used to represent information. Instead of oscillating to
signal spatial correlations, the time differences in neuronal firing
induced by the input patterns begin to become important. Higher levels
begin to model the relationships between high level concepts using
temporal intervals. Hence Abeles finds strict time relationships
between specfic neurons at higher levels. At high levels the time
constant of information is small, thus synaptic weight changes, which
represent information, also have small time constants. In addition,
because the space/time distances being spanned by very high levels are
large, their temporal patterns of activities becomes predictions. They
becomes plans for the future. These high level plans, as well as low
level perceptions, are sent to the basal ganglia which then selects a
high level plan and imposes it on frontal cortex. The selection is
determined on the basis of competion from a number of lower brain
drives. Frontal cortex in turn imposes this
plan/perception on lower layers through succesive feedback
connections. The plan is actually implemented by layered feedforward
projections to the motor cortex. At each stage, the number of
dimensions of the information is reduced until in primary motor cortex
the firing of neurons once again represents the physical space/time
dimensions. Again, since the resolution of the spatial information
required is higher than the temporal, synchronized oscillations are
used to signal spatial correlations at the expense of timing. the
neuronal firing rate increases as a higher rate of information must be
output to precisely direct the muscles.

The basic mechanism is as outlined in the previous essay, with the
following refinements: Input patterns, primarily spatial,
representing features and objects in the real world are stored in
sensory cortices. These low level representations change slowly -
primary sensory cortical feature maps only shift by mm in healthy
brains. Sensory input activates patterns against a background of
feedback prediction. Weak inputs that are not predicted are not
perceived. A strong input, (possibly coherent across a significant 
distance), can cause activation of cells despite the lack of feedback
facilitation. This mismatch is detected in the basal ganglia as what
we call a distractor. Distractors are salient
because they are not predicted by topdown feedback. The basal ganglia 
evaluate and decide how much high level cortex (if any) to assign to 
represent this distractor. This is the process of
attention. If the distracting/novel stimulus is continually repeated,
neurons in the sensory cortex will eventually learn to fire in
response to it (Hebbian, as described previously) - a new object will
come to be represented. As we move into higher level cortex,
collections of features at lower levels are represented as complex
concepts. These concepts are more labile than low level
objects/features. The time constant of synaptic change becomes
shorter, learning becomes faster. Previously, I postulated that each
area (down to a single neuron in size) of cortex literally
represents a different concept. Nothing physically demarcates an
area. The neurons simply self-organize through competition for
input, so that neurons in the same area of (feature) space
are physically close. Each neuron in a group represents a similar but
slightly different feature. The number of neurons in the group
determines the resolution of analysis at that level. This explains
experiments where a large percentage of the neurons are as well tuned
to a stimulus as the behaving animal. There is no averaging, just high
resolution. The neurons do not 'calculate' anything except the sum of
their inputs. They learn to reflect reality by changing the synaptic
weights between them whenever a salient stimulus is repeated.

The time constant of synaptic (therefore conceptual) change continues
to decrease moving into higher level cortex. Time begins to be used to
represent relationships between concepts, so patterns become
predictions and plans. The highest level in most
mammalian cortex is the hippocampus. Here information is abstracted
across large distances in space/time, thus the actual rate of
information processed by the hippocampus is quite low. A single layer
of cells suffices to provide the synapses necessary to instantiate
the highest level conceptual representation that lower animals have. I
propose that the hippocampus, rather than a 'long-term memory
consolidator', is the site of 'thought' in lower animals. 'Place
cells' are literally that: Although we can't read the code of synaptic
weights that determine the higher level features the individual cells
are coding for, we can read the lower level representation of the
population as a whole. The cells are saying 'I am here'. Human's have
evolved frontal cortex to do faster thinking (literally), but they
still have a hippocampus, proportionally not much larger than that of
lower mammals. In humans the hippocampus functions as an intermediate
term memory (Rawlins). The synaptic time constants are such that
memories of concepts stored here last for a few minutes. The
relatively small capacity of the human hippocampus underlies the
psychological observation of the 7+-2 limit to the number of basic
concepts that can be retained over relatively short time periods.

In humans the highest cortical areas (in complexity of concepts
represented) are in the frontal cortex. Here I propose that synapses
change with time constants of seconds. Concepts stored here as
spatio/temporal patterns are known as thoughts. Each thought is the
top of a large pyramid of neurons (shifter hypothesis) that goes all
the way down to the complete map of primary sensory neurons. In the
frontal cortex whole maps of feature space (literally idea space) form and
reform in seconds rather than the days or months of sensory cortex. 

The question of which thought processes are allocated to which neurons
and for how long is controlled by the basal ganglia. What is a thought
process? Just as in sensory cortex a new input is a feature that must
be perceived, in frontal cortex a new input is an idea that must be
understood. The two process are neurally equvalent. Understanding is
the process of matching a new pattern to close patterns (in
feature space). The existing patterns are a dynmical model of some
process in the world in a very complex feature space. The dynamics of
a good model are the same as the real world dynamics being
modeled. Just as in the real world only certain event sequences are
possible in any system, only certain configurations of patterns can
coexist in the cortical module. So understanding is the act of
successfully incorporating a new concept (pattern) into an existing
conceptual structure (existing set of patterns). Now we see why
learning can only proceed incrementally, based on what we already
know. There is nowhere to incorporate totally new patterns. The more
the pattern is repeated, the stronger the synapses become and the
longer the thought is remembered.

Cortex evolved to represent space/time information. To do so features
of increasing complexity are abstracted across larger and larger
distances in space/time. This results in predictions of the future -
modules learn spatio/temporal patterns that replicate the dynamics of
a real world system in order to direct behavior within that system. In
humans frontal cortex performs predictions in conceptual space. These
predictions are called inferences. Trying to infer something is the 
activation of a number of close (in feature space) patterns in the
hope that a new pattern will be generated that represents a
concept in the real world. It works. This is human creativity. The
more profound the thought, the higher level the pattern, the more
abstracted the information, the more the new pattern relates together
old patterns.

Abstraction and the incorporation of new higher order patterns into
existing structures is the basis of human cognition and learning in
all modalities. In the frontal cortex more complex concepts explain
less complex ones. In sensory cortex complex objects unify separate
features into a Gestalt. In motor cortex complex plans (spatiotemporal
patterns) activate sequences of simpler actions. Ideas are understood
in frontal cortex, objects are perceived in sensory cortex, sets of
muscles are contracted from motor cortex. All the same neural
process, at different time scales. 

Frontal cortex is the highest level of integration in the brain. The
most complex information structures are represented and learned
here. Plans initiated here are imposed on lower areas through feedback
as perceptions and actions. The amount of cortex is limited, so
different lower brain drives
compete for the cortex. The basal ganglia assigns thought processes
(as defined above) to the frontal cortex. This is
attention. All cortical areas project to the basal ganglia, which
feeds back to impose its decision on the frontal cortex.
Previously I stated that sensory input (a distractor) can
cause a shift in attention - all cortical areas project to the basal
ganglia thus it can detect novel stimuli anywhere in feature
space. However, normally attention is allocated by the basal ganglia
depending on the state of various lower brain drives ('what you want to do')
The highest areas of frontal cortex are precious because it is here
that learning (understanding and infering) happens the fastest,
whatever the modality in question. The higher up (in cortex) a process
is assigned to, the faster learning will happen. The longer the
process is allowed to run, the more complex (and hence more useful)
the features that will be abstracted. If all of the highest cortex is
allocated to processes, lower cortex must be used to 'run' the
process ('in the back of my mind'). The lower the cortex used, the
slower the learning and the more modality specific it becomes. If the
process has already been repeatedly practiced
(learned at lower levels) - then the process can be run without
needing any attention (higher frontal cortical activity). This can be
useful if a lot of frontal cortex is already occupied (a lot on your
mind) - you can just 'do it without thinking'. In order to abstract
very high level concepts (fully/deeply understand something) you need
assign all frontal cortex to the task (concentrate your attention -
literally clear your mind).

'Staying focused on an activity' means using all frontal cortex for
that process. If a distractor appears and the basal ganglia assigns
cortex to analyze it, you 'lose your attention'. In this situation
ongoing processes are stopped, or continued at lower cortical
levels. Since high level synapses are very labile, A distraction of
just a few seconds can be enough for all the highest level patterns to
decay (forget what you were talking about, for example). When you
return to the interrupted activity it takes time to build up the high
level representations you had (getting back into the game, for
example. In the motor modality, attention is selection of a plan). A
conversation takes place in a very high level feature space in the real world
which will be mapped/modeled by high level (frontal) cortex just like
any other relevant feature space - your train of thought. If you are
distracted you can lose your train of thought. It literally decays
away. The more complex the concepts you were discussing the faster
they decay away, since they were instatiated in the highest level cortex.
If the basal ganglia cannot assign frontal cortex (attention) due to
two equally strong competing lower brain drives, for examle, you are 'in two
minds' about something or even 'confused'.

So, processes in any modality can be assigned to frontal cortex. You
can concentrate on anything you are doing in order to learn to
improve. Since the information structures in frontal cortex only last
for a few seconds, the information must somehow be stored in lower
(perhaps modelity specific) cortex. How? Consider that the raw
information is already present in the lower cortex. The frontal cortex
simply abstracts higher order correlations from the data. It 'sees'
relationships between existing unconnected patterns in lower
cortex. The act of identifying such a relationship is the creation of
a new pattern in frontal cortex. This new pattern, through a process
of feedback to the lower cortex, induces the patterns representing the
previously unconnected data to activate in such a way that new
(synaptic) relationships between them are formed. Hence what was just
learned higher up is passed down to be stored in longer term
memory. This process can contiue incrementally, such that practice
results in concinual improvement as more complex relationships are
abstracted from lower order features.

I don't have a good understanding of how the basal ganglia
work. Basically, though, the striatum appears to be a lateral
inhibitory network designed for filtering out input patterns
(compromised in Huntinton's). A projection from the dopaminergic
midbrain perhaps provides a lower brain 'go' signal to inititate a
process in frontal cortex (compromised in Parkinson's). There is
probably a role for the two separate patch and matrix
systems. Interestingly, they receive input from different depths of
cortical layer 5. This would correspond to different stages of
prediction of the future in each cortical module. The basal
ganglia assign processes to high level frontal cortex (attention)
which has reciprocal relationships with the limbic cortex, which as
described previously is the model the brain makes of itself -
consciousness. In this way consciousness is integrated with perceptual state.

This revised theory provides a better model of memory: Concepts,
perceptions, motor sequences etc are all stored as increased synaptic
strengths between neurons in the appropriate cortex. Learning occurs
at progessively faster rates as we move up cortex, from primary
sensory to frontal, but it all works via the same process - Hebbian
correlation. The decreasing time constant of synaptic change as we
move up cortex corresponds to the progression from long-term memory to
short-term memory. The shortest term memory is in frontal cortex - our
thoughts. The more a pattern is repeated, the stronger the weights
become. During REM sleep (dynamics controlled by thalamus), random
cortical activation (from the brainstem) activates patterns that have
been most strongly activated in the recent past. In addition, new
patterns that fire synergistically with old patterns (new concepts
that 'fit' with old ones well - well explained) are also
activated. The synaptic strenths of other patterns are reduced. In
this way sleep literally clears our mind of irrelevant
information. Note that due to modality specific storage at lower
levels and the increasing rate of synaptic change
as we progress up cortex, a single sleep process suffices to maintain
all forms of memory. Sleep deprivation produces disorderd thoughts
'loss of memory', hallucinations, motor tremors and eventually
death. Spurious synaptic weights continue to build up and prevent
normal functioning of the cortex. Sleep is essential for the continued
function of the brain. This theory explains why 'higher consciousness'
goes away during sleep - all memory with a short time constant
(thoughts) decays away, leaving the frontal cortex empty of
patterns. Longer term memory has a time constant long enough to endure
through the night, though the next day if you don't rehearse what you
did the day before you will soon forget it. Children sleep longer than
adults since they are learning (making new patterns) at a high
rate. Old people make patterns at a slower rate therefore need less sleep.

Like I said before, this theory provides neural definitions of most
'abstract' words that deal with mental functioning. Many concepts that
seemed unrelated are explained within the single framework as the
operation of a single function at different timescales. Intelligence
can be defined at a high level as the ability to acheive your
goals. It is composed of rationality and decision making: You need a
good model of the world to predict events (cortex) and a way to relate
your goals to the model (basal ganglia). The cortical component of
intelligence seems to be how fast and to what level you can keep
abstracting information - building higher and higher order
patterns. This is literally 'making sense'. The opposite of
intelligent is stupid or slow - the inability to quickly abstract
relationships within input information. Intelligence is the ability to
quickly discover theories.

I use the word 'discover' because real world knowledge/information 
or W must have a definite structure. We can see this by
considering that the cortex evolved to predict the future in time and
space - it maps the structure of the physical world in space/time. The
fact that this process works at many space and time scales is to me an
indication that 'random' processes (eg quantum mechanics) do not have an
influence on the features of the world that are relevant to
us. As we go to higher areas, the complexity of information space that
the neurons are mapping increases. In frontal cortex, the
relationships between high level ideas are mapped - our belief
structure. Just as in space/time, the cortex can make predictions in
'idea space'. They are called inferences. A guess is a pattern
produced by just a few neighboring patterns, an assumption is a
pattern produced by many surrounding patterns.

Here's a weird epilogue:

Since inferences work, W must have a definite structure
over arbitrary spaces and times (high level concepts abstract features
across much larger distances than low level features). We can think of
problem solving as mapping the structure of W. W is objective
reality. It exists, our brains map it. The processes of mapping W is
based on two subproceses - data collection and inference. For
'inference' we can substitute 'data fabrication'. Theories are
structures in W that must be discovered by mapping component concepts
and their interrelationships. As we go
higher up W (in complexity), the role of real world data diminishes -
the complex concepts are much less dependent on physical data than low
level perceptions - and thinking (meditation) becomes a viable tool of
discovery of the structure of W. The key question here is whether W
has a limit - is there a single concept at the top of complexity space
that contains all relevant information? The fact that lots of time is
yet to come is not relevant - this concept would abstract very far
forward into the future. It is interesting to note that many
Eastern philosophies assert that there is, and the path to reaching it
(progressively building sW) is through meditation after a series of
life experiences (data collection). The mind (frontal cortex) must be
cleared (eyes closed - shut out new distracting input). This concept
is often refered to as 'Oneness', the idea that all things are
related. The highest concept in W abstracts features from all other
features of the world, and so is undoubtedly a good candidate. They
also assert that this concept cannot be understood by anyone who has
not reached it themselves. We see that this is true by the neural
definition of understanding - a very high level new concept cannot be
incorporated into a low level sW. It is also asserted that this
concept would be the meaning of existence. Certainly any concept that
abstracted all the relevant information out of the world would
qualify. The principles of Buddism state that life is full of
suffering caused by desire (lower brain inputs). The route to
enlightenment (highest concept in W) is to gain control over your
emotions and meditate (supress lower brain inputs to free up all
frontal cortex to abstract very high level concepts from your existing
sW). I don't know what to make of this stuff.

Notice the use of the word relevant. Information abstraction in the
brain is done on the basis of behavioral relevance. The neocortex on
its own would not function - there are too many dimensions to map at
very high complexity - it relies on lower brain input to tell it which
features to store and which to ignore. However, the neocortex performs
its function (prediction/inference) best without any 'distorting'
lower brain input - the best (to guide behavior) model of the real
world is a faithful one. Thus the various lower brain inputs (emotions,
motivations, sensations) both guide and impede the function of the cortex.

Paul



Copyright Paul Bush 1996 all rights reserved etc etc.

-- 



More information about the Neur-sci mailing list