as the number of predicates increase, the number of linearly separable
functions becomes proportionately smaller as is made clear by the
following extract from Wasserman (1989) when discussing the concept of
linear separability:
'We have seen that there is no way to draw a straight
line subdividing the x-y plane so that the exclusive-or
function is represented. Unfortunately, this is not an
isolated example; there exists a large class of
functions that cannot be represented by a single-layer
network. These functions are said to be linearly
inseparable, and they set definite bounds on the
capabilities of single-layer networks.
Linear separability limits single-layer networks to classification
problems in which the sets of points (corresponding to input values)
can be separated geometrically. For our two-input case, the separator
is a straight line. For three inputs, the separation is performed by a
flat plane cutting through the resulting three-dimensional space. For
four or more inputs, visualisation breaks down and we must mentally
generalise to a space of n dimensions divided by a "hyperplane", a
geometrical object that subdivides a space of four or more
dimensions.... A neuron with n binary inputs can have 2 exp n
different input patterns, consisting of ones and zeros. Because each
input pattern can produce two different binary outputs, one and zero,
there are 2 exp 2 exp n different functions of n variables.
As shown [below], the probability of any randomly selected function
being linearly separable becomes vanishingly small with even a modest
number of variables. For this reason single-layer perceptrons are, in
practice, limited to simple problems.
n 2 exp 2 exp n Number of Linearly Separable Functions
1 4
2 16 14
3 256 104
4 65,536 1,882
5 4.3 x 10 exp 9 94,572
6 1.8 x 10 exp 19 5,028,134
P. D. Wasserman (1989)
Linear Separability: Ch2. Neural Computing Theory and Practice
In later sections evidence is presented in the context of clinical vs.
actuarial judgment that human judgement is severely limited to
processing only a few variables. Beyond that, non- linear fits become
more frequent. This is discussed later in the context of connectionist
'intuitive',inductive inference and constraints on short-term or
working memory span (c.f. Kyllonen & Christal 1990 - "Reasoning
Ability Is (LIttle More Than) Working-Memory Capacity?!"), but it is
worth mentioning here that in the epilogue to their expanded re-print
of their 1969 review of neural nets 'Perceptrons - An Introduction to
Computational Geometry', after reiterating their original criticism
that neural networks had only been shown to be capable of solving 'toy
problems', ie problems with a small number of dimensions, using 'hill
climbing' algorithms, Minsky and Papert (1988) effectively did a
'volte face' and said:
'But now we propose a somewhat shocking alternative:
Perhaps the scale of the toy problem is that on which,
in physiological actuality, much of the functioning of
intelligence operates. Accepting this thesis leads into
a way of thinking very different from that of the
connectionist movement. We have used the phrase "society
of mind" to refer to the idea that mind is made up of a
large number of components, or "agents," each of which
would operate on the scale of what, if taken in
isolation, would be little more than a toy problem.'
M Minsky and S Papert (1988) p266-7
and a little latter, which is very germane to the fragmentation of
behaviour view being advanced in this volume:
'On the darker side, they [parallel distributed
networks] can limit large-scale growth because what any
distributed network learns is likely to be quite opaque
to other networks connected to it.'
ibid p.274
This *opacity* of aspects, or elements, of our own behaviour to
ourselves is central to the theme being developed in this volume,
namely that a science of behaviour must remain entirely extensional
and that there can not therefore be a science or technology of
psychology to the extent that this remains intensional (Quine
1960,1992). The discrepancy between experts' reports of the
information they use when making diagnoses (judgments) is reviewed in
more detail in a later section, however, research reviewed in Goldberg
1968, suggests that even where diagnosticians are convinced that they
use more than additive models (ie use interactions between variables -
which statistically may account for some of the non-linearities),
empirical evidence shows that in fact they only use a few linear
combinations of variables (cf. Nisbett and Wilson 1977, in this
context).
As an illustration of methodological solipsism (intensionalism) in
practice consider the following which neatly contrasts subtle
difference between the methodological solipsist approach and that of
the methodological or 'evidential' behaviourist.
Several years ago, a prison psychologist sought the views of prison
officers and governors as to who they considered to be 'subversives'.
Those considered 'subversive' were flagged 1, those not considered
subversive were flagged 0. The psychologist then used multiple
regression to predict this classification from a number of other
behavioural variables. From this he was able to produce an equation
which predicted subversiveness as a function of 4 variables: whether
or not the inmate had a firearms offence history, the number of
reports up to arrival at the current prison, the number of moves up to
arrival where the inmate had stayed more than 28 days, and the number
of inmate assaults up to arrival.
Note that the dependent variable was binary, the inmate being
classified as 'subversive' or 'not subversive'. The prediction
equation, which differentially weighted the 4 variables, therefore
predicted the dependent variable as a value between 0 and 1. Now the
important thing to notice here is that the behavioural variables were
being used to predict something which is essentially a propositional
attitude, ie the degree of certainty of the officers beliefs that
certain inmates were subversive.
The methodological solipsist may well hold that the officer's beliefs
are what are important, however, the methodological behaviourist would
hold that what the officers thought was just *an approximation of what
the actual measures of inmate behaviour represented*, ie his thoughts
were just vague, descriptive terms for inmates who had lots of
reports, assaulted inmates and had been moved through lots of prisons,
and were probably in prison for violent offences. What the officers
thought was not perhaps, all that important, since we could just go to
the records and identify behaviours which are characteristic of
troublesome behaviour and then identify inmates as a function of those
measures (cf. Williams and Longley 1986).
In the one case the concern is likely to be with developing better and
better predictors of what staff THINK, and in the other, it becomes a
matter of simply recording better measures of classes of behaviour and
empirically establishing functional relations between those classes.
In the case of the former, intensional stance, one becomes interested
in the *psychology* of those exposed to such factors (ie those exposed
to the behaviour of inmates, and what they *vaguely or intuitively
describe it as)*. From the extensional stance (methodological
behaviourist) defended in these volumes, such judgments can only be a
**function** of the data that staff have had access to. From the
extensional stance, one is simply interested in recording *behaviour*
itself and deducing implicit relations. Ryle (1949) and many
influential behaviourists since (Quine 1960), have, along with Hahn
(1933) suggested that this is our intellectual limit anyway:
'It is being maintained throughout this book that when
we characterize people by mental predicates, we are not
making untestable inferences to any ghostly processes
occurring in streams of consciousness which we are
debarred from visiting; we are describing the ways in
which those people conduct parts of their predominantly
public behaviour.'
G. Ryle
The Concept of Mind (1949)
Using regression technology as outlined above is essentially how
artificial neural network software is used to make classifications, in
fact, there is now substantial evidence to suggest that the two
technologies are basically one and the same (Stone 1986), except that
in neural network technology, the regression variable weights are
opaque to the judge, cf. Kosko (1992):
'These properties reduce to the single abstract property
of *adaptive model-free function estimation*:Intelligent
systems adaptively estimate continuous functions from
data without specifying mathematically how outputs
depend on inputs...A function f, denoted f: X Y, maps
an input domain X to an output range Y. For every
element x in the input domain X, the function f uniquely
assigns the element y to the output range Y.. Functions
define causal hypotheses. Science and engineering paint
our pictures of the universe with functions.
B. Kosko (1992)
Neural Networks and Fuzzy Systems: A Dynamical Systems
Approach to Machine Intelligence p 19.
Today, formal modelling of such intensional processes is researched
using a technology known as 'Neural Computing' which uses inferential
statistical technologies closely related to regression analysis.
However, such technologies are inherently inductive. They take samples
and generalise to populations. They are at best pattern recognition
systems.
Such technologies must be contrasted with formal deductive logical
systems which are algorithmic rather than heuristic (extensional
rather than intensional). The algorithmic, or computational, approach
is central to classic Artificial Intelligence and is represented today
by the technology of relational databases along with rule and
Knowledge Information Based System (KIBS) which are based on the First
Order Predicate Calculus, the Robinson Resolution Principle (Robinson
1965,1979) and the long term objectives of automated reasoning (e.g.
Wos et. al 1992).
--
David Longley (check end reply line #)
Longley Consulting London, UK
Behaviour Assessment & Profiling Technology,
Research, Data Analysis and Training Services,
Small IT Systems http://www.longley.demon.co.uk