We are feature detectors

Paul Bush paul at phy.ucsf.edu
Tue Apr 30 21:16:29 EST 1996


This is more on my hypothesis that the sole function of the
cerebral cortex is to construct a model of the world. It has
a new added ingredient - testable predictions!

You will get the most out of this if you have read my two previous
posts on the subject, since I am not going to re-present the same
details, and I am going to refer to previously made points.

I assert that the cortex does not 'compute' anything beyond higher
and higher order correlations over successively abstracted
features. A feature is any space/time invariance in the Universe
detectable by us and behaviorally relevant. 

The ultimate determination of behavioral relevance is made by the
basal ganglia (BG), since they integrate total cortical output and
then feedback to assign the correct 'plan of action/perception' on the
frontal cortex. Mark Laubach (an active researcher in the BG system)
views the "activation of a spiny cell [in the striatum of the BG] as
indicating a coincident activation of a collection of cortical cells
distributed in different locations in cortex and activations at
different levels of the striatum  as representing unique info occuring
in relation to a common "event" (e.g., a behavior, thought)". He
thinks that the BG 'set the occassion for behavior', determining the
behavioral context in which any one action is implemented - "Neurons
in BG only respond to things motor or sensory _in the context of some
task_ (in the context of some strucuture)". I interpret this as selecting an
appropriate state from the many that are sent from cortex and imposing
this state on top-level cortex (as a motor plan or new thought
process, with its attendant perceptual state). How is this done?
Certainly no one knows yet for sure. Here's my speculation:

The striatum of the BG is divided into functional compartments called
patch and matrix. As Mark says, "Different receptor systems in the two 
compartments; maybe different regulation of a common neural
circumstance (in the sense of a coordinated pattern of firing)". In
the ventral striatum patches are innervated by cingulate/prelimbic
cortex but avoided by higher motor cortices. In my theory
cingulate/prelimbic cortex is the model of self (cf Crick) - it
receives direct hypothalamic and aminergic lower brain input,
which it correlates with behavioral input from the prefrontal
cortex. The input to the two ventral patch/matrix compartments come from
different depths of cortical layer 5, which corresponds to different
stages of prediction of the future in each cortical module. Thus I
propose that the ventral striatum (and perhaps other BG areas) selects
a plan from the prefrontal/higher motor input based on the context of
the input from the cingulate/prelimbic cortex. Berns and
Sejnowski have developed a model in which the globus pallidus (gp) is
specifically inhibited to allow one action through the thalamus (to
be implemented in frontal cortex) before following excitatory input
from the subthalamic nucleus depolarizes the rest of the gp and
'closes the gate'. 

In this (or a related) way the BG select action based on behavioral
relevance. In lower animals perhaps the direct dopaminergic striatal 
projection is more important in determining behavioral context, but
human behavior is often guided by 'higher feelings'. How does this
work? Because we are physiological beings, we have basic motivations
that demand satisfaction (eating, drinking, sleeping, sex etc). Each
cortical plan/prediction of behavior (from prefontal cortex) enters 
prelimbic/cingulate cortex where it elicits an emotional response. The
prelimbic cortex correlates the behavioral plans with emotional
responses - its feature space is our personaility. Higher order
correlations combine many emotions and behaviors in a way that is not
always constrained to satisfy the original drives. Prelimbic cortex
does not project to motor areas - there is no implementation of
personal goals this way. Instead, possible plans enter prelimbic
cortex from neighboring prefrontal cortex. The spatio/temporal
patterns that instantiate these plans produce output from prelimbic
networks - the emotional reaction to each plan, or perhaps a
modification of the plan based on previous emotional responses to the
same/similar plans. The prefrontal/higher motor and the prelimbic
cortices project to separate regions of the striatum and the final
action is selected in the BG (as outlined above).

(Note that my theory predicts that the operation of the BG in neonates
must be radically different from adults (or older children), since
neonates display essentially random behavior with almost no topdown
selection.)

In this way the BG are implementing selective attention. Previously I
outlined how the BG deal with distractors, assigning prefrontal cortex
to represent (and abstract higher order features from) new stimuli. My
conception of the BG's behaviorally selective action as attention is
similar in a number of ways to a recent theory of the BG proposed by
Jackson and Houghton. What is attention? Attention is the assignment
of prefrontal cortex to the 'processing' of input - attention is where
learning is happening. Why do we need attention? Firstly, the
individual must prioritize its actions - select the actions that will
most efficiently accomplish its goals. As proposed previously, the
prefrontal cortex 'processes' data quickest, because it has the
fastest time constant of (synaptic) change. New concepts are abstracted
from simpler ones, and entirley new concepts/ideas are created in a
process of pattern induction. Attention can be considered as the
process of getting as much of the most important data where it needs
to go as quickly as possible. At the level of the individual, the
determination of 'important' must be done using behavioral relevance.

I propose that the BG implement volitional 'selective' attention,
analogous to Posner's anterior attentional system. Lower down
cortex, closer to the primary sensory areas, the posterior attentional
system operates. I previously described the development of an infant's
cortical model of the world (sW), stating that at first patterns are
constructed without top-down direction (neonate is randomly
active). This is because at low levels all features abstracted are
important - low levels of sW are largely determined by data. As the
concepts (patterns) constructed come to have behavioral significance,
top-down influence on learning begins to have an increasing role. This
top-down influence is attention, as described above. I propose that
the posterior attentional system is performing the same function as
the anterior system, but with decreasing top-down control. Attention
at this stage is simply a process of maximising the flow of
information. How does this work? 

Let's consider the visual system: It works because the features that
are most informative at low levels do not change over the lifetime of
the individual. What are the most basic features of a visual target,
beyond the presence or absence of light at a point? In order, they
are: Where is it going, how fast is it going, what is it? These
features are the first to be encoded in nearly all visual systems. In
lower animals direction and velocity selectivity appear in the
retina. Postponing feature abstraction to higher levels allows more
information to be abstracted, thus cat cells do not show velocity or
direction selectivity until layer 4 of primary visual cortex. These
cells also abstract another feature - contrast edges - the first step
in answering 'what is it?' The higher the contrast, the more they
fire. These three features convey the most information about the
stimulus - most other higher-level features are abstracted from
them. They are the 'principal components' of the visual input
data. Thus, the brain can abstract the most information about the
stimulus if it makes the firing rate of its sensory neurons dependant
on these features. Layer 4 cells have a number of (gaussian) tuning
curves - they fire best to a stimulus with the optimal orientation,
velocity and direction, but fire to some degree to slightly
non-optimal stimuli. (This means that rather than signalling discrete
coordinates in feature space, neuronal firings create
multi-dimensional gaussian probability density functions (pdf). The
more a neuron fires, the smaller space its pdf occupies - it signals with
more accuracy.) If a layer 4 cell is firing maximally, it is being
stimulated by its best stimulus, and it is providing the most
information about that stimulus. I propose that in order to increase
the flow of this information, cortical circuitry increases the stimulus
contrast that this neuron receives - because the neuron's firing is
caused by the presence of the most informative features, attention at
this level reduces to further enhancing its firing relative to other
neurons at the same level. In V1 (at least) this is done by
lateral inhibition. Lateral inhibition operates within a cortical
column, but it is the connections between cortical levels that I will
focus on now. The effects of attention are propagated down by the
activation of feedback connections that enter layer 1 of lower
levels. These feedback connections can depolarize the neurons in one
area of feature space and hyperpolarize neighboring regions, shrinking
receptive fields, increasing contrast. This process continues all the
way to the lgn, where it operates with the least precision (over slow time
scales and large areas) - the feedback of layer 6 cells in V1 excites (with a
long time constant - metabotropic receptors) lines of
lgn cells (alligned with the layer 6 RFs?) and inhibits surrounding
lgn cells via more diffuse projections to the inhibitory reticular
nucleus of the thalamus. The diameters of the layer 6 axons (therefore
their conduction velocities) vary widely so that the contrast
enhancement is temporally blurred, to match its broad spatial
extent. This fits with the only know operation of lgn RFs - contrast
enhancement. At low cortical levels, then, because neurons close in
feature space are close in physical space and because their firing is
dependent on the most informative features in the input, feedback
connections instantiate attentional processes.

I propose that the role of the BG in the anterior attentional system
is played by the pulvinar (+LP nuc?) in the posterior attn
system. Because one is at the top and one near the bottom of the
cortical hierarchy, The BG selects plans, while the pulvinar maximizes
information flow. The pulvinar is a large nucleus in the thalamus that
reciprocally projects to most visual cortical areas. Van Essen's group
has recently proposed a model based on the shifter circuit hypothesis
that assigns a similar role to the pulvinar. Their's is a model of
attention in which higher cortical areas 'zoom in' on features of
interest, creating a higher resolution 'window' of the target. The
pulvinar acts to dynamically change visual cortical synaptic strengths
to 'route' information from a selected area of the visual image up to
IT. One prediction of their model is that the feature space (therefore
2D map) of IT should change on the time scale of attention
(seconds). In contrast my theory predicts this fast change in
prefrontal cortex, not IT. Instead I propose that the pulvinar simply
coordinates the contrast enhancement of a region in feature space in
which the neurons are already firing strongly. Since the pulvinar
projections are excitatory, one obvious mechanism is lateral
inhibition as produced by layer 6 cells in the lgn (discussed
above). In essence, I am claiming that the pulvinar assists in
changing the dynamics of visual cortex, whereas they claim that the pulvinar
changes the synaptic weight structure. Some top-down (BG) attentional
effects may filter down to visual cortex (the lowest level affected is
controversial) through cortical feedback connections and also through
a parietal projection to the pulvinar, but there is a definite data
constrained bottom-up info stream that merges into a behaviorally
constrained top-down info stream.

The Van Essen model predicts that lesions of the pulvinar should
affect visual pattern recognition abilities (besides attention), which has been
shown not to be the case in a number of studies. Instead my theory
predicts that pulvinar lesions should significantly slow the rate of
learning of new patterns, and impair performance on 'higher' tasks that depend
on the use of (complex) visual patterns, because the pulvinar maximizes 
visual information flow to all higher areas.

As noted above, my theory predicts fast changes in mapping in
prefrontal cortex, rather than in IT. Previously I stated that all
'mental functions' could be explained by the processes of pattern
formation, matching and induction, operating at different time
scales. Here is a table that displays that data: (see text below for
explanation) 

Level      'Processing'          Learning            (rate) Predicting    

Visual cx  Recog. spatial pttrn  (data) new pattern  (slow) Hallucination
assoc. cx  Recog/regen s/t pttrn 'fit' new s/t pttrn        Process = prediction 
preftl cx  Regen. s/t patterns   'fit' new s/t pttrn (fast) Induce new pttrn.


In visual cortex primarily spatial firing patterns are stored as
synaptic weight changes and later examples are
recognised as perceptions. As noted previously, the synchronized
oscillations recorded by Singer et al are the use of the temporal
'channel' to carry more spatial information, since temporal resolution
at this level is not very high. These oscillations arise
when the cells are optimally stimulated, ie. when they are
experiencing the highest contrast stimuli - this is when most
information is available about the stimulus. Given a V1 complex RF width of 1
degree, the optimal velocity tuning of about 20 deg/sec would predict
a temporal resolution of about 20 Hz, the lowest frequency
synchronized oscillation seen experimentally. The higher oscillation 
frequencies (up to about 80 Hz) result from temporal correlations
between simple cell firings detected by the complex cells. The
broadening of orientation tuning seen in these complex cells (and
perhaps the broadening in tuning seen in deep layers vs. upper layers)
occurs because higher levels abstract across larger distances in all
dimensions, orientation as well as the 4 conventional space/time
dimensions. These cells are not directly 'used' for orientation
discrimination by the individual organism, rather they are used to
construct much higher order features/concepts that incorporate line
orientation as a component feature. These higher order concepts are
the ones 'used' for behavioral judgements/decisions. Prediction of new
patterns does not occur in visual cortex (except pathologically as
hallucinations) because at this low level pattern structures are
constrained by data.

At higher cortical levels, abstracted across sufficiently large
distances in (space and) time, rather than remaining as its own
representation, time is used as a third representational
dimension. This allows the change from spatial patterns of firing to
spatio-temporal patterns. Thus association cortex recognises and
regenerates new s/t patterns during perception and recall. New
input patterns 'fit' with existing patterns because the sW at this level
begins to have some objective structure (cf viewpoint consistency
constraint). This is the process of understanding. As described
previously, learned high order spatio-temporal patterns are
predictions: When the pattern is activated it goes through its
sequence, predicting which features will occur (and when they will
occur). This applies in all modalities, though it is most intuitive
when considering the motor modality.

In order to keep abstracting invariances out of increasing numbers of
dimensions of information, time is signalled with increasing precision
'up' cortex. At the highest cortical levels (prefrontal cortex), the
dynamics of the active conductances in the neuronal membrane are tuned
to cause this precise spike timing. At this point (or a point soon after?) 
synapses no longer store patterns. Perhaps the kinetics of the
conductances change on fast time scales to form a fast memory, but
this reaches a limit where the s/t patterns fade over seconds. These
patterns are our faster thoughts - the edge of thought is in
prefrontal cortex. Prefrontal cortex understands new
concepts (fits new patterns) the quickest, and because at high levels
there is a strong objective constraint on the structure of the feature
space, new patterns can be induced to appear by the activation of a
number of nearby patterns. This induction process is the basis of
human creativity. Goldman-Rakic has characterized the activation of
prefontal neurons (in a motor or occulomotor task with spatial and/or
temporal delays) as occuring only from 'a representation or
concept'. In her view, prefrontal cortex does memory guided
performance - responses are driven by internal representations as
opposed to 'associative processes', sensory guidance, or
reflexes. This thinking fits exactly with my conception of the
prefrontal cortex. (Though I think prefrontal cortex is doing
basically the same 'process' as other cortical areas, just with highly
abstracted input and at a faster rate).

(Animals with prefontal cortex lesions are not able to learn these delayed
spatial tasks. My theory predicts that animals with lesioned BG or
mediodorsal thalamic nucleus would also not be able to learn them either.)

If the highest level patterns abstracted (thoughts) fade from
prefrontal cortex in seconds, then this information must in some way
be stored in lower cortex. As I said previously, this is a matter of
creating connections between the existing less complex features in the level
below to reflect the relationships abstracted at the higher level. I
believe this occurs to some degree during waking but mostly during
sleep. The cortex operates in some way analogous to a neural network
called a 'Helmholtz machine'. This network has multiple layers of
units, each sending feedforward projections to the layer above and
receiving feedback connections from the layer below. In training the
network receives input patterns and changes the synaptic weights of
the feedback connections. In 'sleep' the input connections are turned
off and random activation of the feedback connections is used to
change the weights of the feedforward synapses. The sleep process can
be viewed as a form of redundancy reduction. I propose cortical
networks operate according to the same principle - the 'learning'
during random feedback activation during REM sleep acts to reduce
information redundancy such that higher level relationships between
features/concepts (essentially connections between neurons) are
reduced to relationships between the component features at lower
levels. In this way the information defining high level concepts
actually moves back down cortex towards lower level sensory
areas. Since the lower levels change slower, it takes longer to move
information to the lower levels, but when there it lasts for a long
time (consolidation of long term memory). Thus in an adult
concepts/features at many levels of complexity are stored at many
cortical levels. This explains why LTP can be quickly induced in lower
cortices - sometimes we discover that one of our
(literally) deepest and longest held assumptions is wrong. This is
often a shocking experience, because changing it involves making a
fast change in cortex that normally changes slowly, and then making
changes in all the concepts that were dependant on the incorrect assumption.

The cortical architecture is significantly different
from that of the Helmholtz machine. Perhaps the biggest difference is
that at each level, instead of a single layer of units that receives
both top down and bottom up input, each cortical level is made up of 
columns of cells. Each column shows characteristic changes from top to bottom:
The feedback input enters at the top (mostly layer 1). Layer 2 cells
receive most of their input from this feedback projection. The upper
layers of the column contain cells that are connected at a density of
about 10%, synaptic weights are generally small and inhibition (at
least the slow GABAb inhibition) is relatively strong. Cells in the
upper layers fire trains of spikes that adapt (decrease in
firing frequency). As we move down the column to layer 5, the density of
connectivity decreases to about 1%, synaptic strengths becomes larger,
inhibition decreases and the cells fire in discrete bursts rather than
continous spike trains. The feedforward input to the column goes
mostly to the upper layers, with small amounts to the lower
layers. Overall, the cortical column seems to be performing some type
of annealing function (cell output functions change from sigmoids to
steps), merging the feedback with the feedforward input in the upper
layers to create discrete outputs in the sparsely (but strongly)
connected output layer (layer 5).

Another important point is that many (~30%) cortical projection neurons
(pyramidal cells) do not project out of their local area (into the
white matter). Thus these cells are part of the local representation
but do not signal any features since they do not project to the other
levels. Perhaps they 'stabilize' the patterns in some way.

Recent experiments in behaving rats show that the spatio/temporal
firing patterns (as seen between two cells) learned by hippocampal
neurons during task performance repeatedly reoccur during REM sleep. I
propose that this is an example of the process described above. 

Paul



Copyright Paul Bush 1996 all rights reserved etc etc.



More information about the Neur-sci mailing list