No subject

Sun Apr 10 21:30:27 EST 2005

as the number of predicates increase, the number of linearly separable 
functions  becomes  proportionately smaller as is made  clear  by  the 
following extract from Wasserman (1989) when discussing the concept of 
linear separability:

    'We  have seen that there is no way to draw  a  straight 
    line subdividing the x-y plane so that the  exclusive-or 
    function  is represented. Unfortunately, this is not  an 
    isolated   example;  there  exists  a  large  class   of 
    functions  that cannot be represented by a  single-layer 
    network.  These  functions  are  said  to  be   linearly 
    inseparable,  and  they  set  definite  bounds  on   the 
    capabilities of single-layer networks.

Linear  separability  limits single-layer networks  to  classification 
problems  in which the sets of points (corresponding to input  values) 
can be separated geometrically. For our two-input case, the  separator 
is a straight line. For three inputs, the separation is performed by a 
flat plane cutting through the resulting three-dimensional space.  For 
four  or more inputs, visualisation breaks down and we  must  mentally 
generalise  to  a space of n dimensions divided by a  "hyperplane",  a 
geometrical   object  that  subdivides  a  space  of  four   or   more 
dimensions....  A  neuron  with  n binary inputs  can  have  2  exp  n 
different  input patterns, consisting of ones and zeros. Because  each 
input pattern can produce two different binary outputs, one and  zero, 
there are 2 exp 2 exp n different functions of n variables.

As  shown [below], the probability of any randomly  selected  function 
being linearly separable becomes vanishingly small with even a  modest 
number of variables. For this reason single-layer perceptrons are,  in 
practice, limited to simple problems.

  n    2 exp 2 exp n   Number of Linearly Separable Functions
  1         4                             
  2        16                             14
  3       256                            104
  4    65,536                          1,882
  5  4.3 x 10 exp  9                  94,572
  6  1.8 x 10 exp 19               5,028,134

P. D. Wasserman (1989)
Linear Separability: Ch2. Neural Computing Theory and Practice

In later sections evidence is presented in the context of clinical vs. 
actuarial  judgment  that  human  judgement  is  severely  limited  to 
processing only a few variables. Beyond that, non- linear fits  become 
more frequent. This is discussed later in the context of connectionist 
'intuitive',inductive  inference  and  constraints  on  short-term  or 
working  memory  span  (c.f. Kyllonen &  Christal  1990  -  "Reasoning 
Ability  Is (LIttle More Than) Working-Memory Capacity?!"), but it  is 
worth mentioning here that in the epilogue to their expanded  re-print 
of their 1969 review of neural nets 'Perceptrons - An Introduction  to 
Computational  Geometry', after reiterating their  original  criticism 
that neural networks had only been shown to be capable of solving 'toy 
problems', ie problems with a small number of dimensions, using  'hill 
climbing'  algorithms,  Minsky  and Papert (1988)  effectively  did  a 
'volte face' and said:

    'But  now  we propose a somewhat  shocking  alternative: 
    Perhaps  the scale of the toy problem is that on  which, 
    in  physiological actuality, much of the functioning  of 
    intelligence operates. Accepting this thesis leads  into 
    a  way  of  thinking very different  from  that  of  the 
    connectionist movement. We have used the phrase "society 
    of mind" to refer to the idea that mind is made up of  a 
    large  number of components, or "agents," each of  which 
    would  operate  on  the  scale  of  what,  if  taken  in 
    isolation, would be little more than a toy problem.'

M Minsky and S Papert (1988) p266-7

and  a  little latter, which is very germane to the  fragmentation  of 
behaviour view being advanced in this volume:

    'On   the  darker  side,  they   [parallel   distributed 
    networks] can limit large-scale growth because what  any 
    distributed network learns is likely to be quite  opaque 
    to other networks connected to it.'

ibid p.274

This  *opacity*  of  aspects, or elements, of  our  own  behaviour  to 
ourselves  is  central to the theme being developed  in  this  volume, 
namely   that a science of behaviour must remain entirely  extensional 
and  that  there  can  not therefore be a  science  or  technology  of 
psychology  to  the  extent  that  this  remains  intensional   (Quine 
1960,1992).   The   discrepancy  between  experts'  reports   of   the 
information they use when making diagnoses (judgments) is reviewed  in 
more detail in a later section, however, research reviewed in Goldberg 
1968, suggests that even where diagnosticians are convinced that  they 
use more than additive models (ie use interactions between variables - 
which  statistically  may account for some  of  the  non-linearities), 
empirical  evidence  shows  that in fact they only use  a  few  linear 
combinations  of  variables  (cf. Nisbett and  Wilson  1977,  in  this 

As  an  illustration of methodological solipsism  (intensionalism)  in 
practice   consider  the  following  which  neatly  contrasts   subtle 
difference  between the methodological solipsist approach and that  of 
the methodological or 'evidential' behaviourist. 

Several  years ago, a prison psychologist sought the views  of  prison 
officers and governors as to who they considered to be  'subversives'. 
Those  considered  'subversive' were flagged 1, those  not  considered 
subversive  were  flagged  0.  The  psychologist  then  used  multiple 
regression  to  predict  this classification from a  number  of  other 
behavioural  variables. From this he was able to produce  an  equation 
which  predicted subversiveness as a function of 4 variables:  whether 
or  not  the  inmate had a firearms offence  history,  the  number  of 
reports up to arrival at the current prison, the number of moves up to 
arrival where the inmate had stayed more than 28 days, and the  number 
of inmate assaults up to arrival.

Note  that  the  dependent  variable  was  binary,  the  inmate  being 
classified  as  'subversive'  or  'not  subversive'.  The   prediction 
equation,  which  differentially weighted the 4  variables,  therefore 
predicted  the dependent variable as a value between 0 and 1. Now  the 
important thing to notice here is that the behavioural variables  were 
being  used to predict something which is essentially a  propositional 
attitude,  ie  the degree of certainty of the  officers  beliefs  that 
certain inmates were subversive.

The methodological solipsist may well hold that the officer's  beliefs 
are what are important, however, the methodological behaviourist would 
hold that what the officers thought was just *an approximation of what 
the actual measures of inmate behaviour represented*, ie his  thoughts 
were  just  vague,  descriptive  terms for inmates  who  had  lots  of 
reports, assaulted inmates and had been moved through lots of prisons, 
and  were probably in prison for violent offences. What  the  officers 
thought was not perhaps, all that important, since we could just go to 
the  records  and  identify behaviours  which  are  characteristic  of 
troublesome behaviour and then identify inmates as a function of those 
measures (cf. Williams and Longley 1986).

In the one case the concern is likely to be with developing better and 
better predictors of what staff THINK, and in the other, it becomes  a 
matter of simply recording better measures of classes of behaviour and 
empirically  establishing functional relations between those  classes. 
In the case of the former, intensional stance, one becomes  interested 
in the *psychology* of those exposed to such factors (ie those exposed 
to  the  behaviour of inmates, and what they *vaguely  or  intuitively 
describe   it  as)*.  From  the  extensional  stance   (methodological 
behaviourist) defended in these volumes, such judgments can only be  a 
**function**  of  the  data that staff have had access  to.  From  the 
extensional stance, one is simply interested in recording  *behaviour* 
itself   and  deducing  implicit  relations.  Ryle  (1949)  and   many 
influential  behaviourists since (Quine 1960), have, along  with  Hahn 
(1933) suggested that this is our intellectual limit anyway:

    'It  is being maintained throughout this book that  when 
    we characterize people by mental predicates, we are  not 
    making  untestable inferences to any  ghostly  processes 
    occurring  in  streams  of consciousness  which  we  are 
    debarred  from visiting; we are describing the  ways  in 
    which those people conduct parts of their  predominantly 
    public behaviour.'

    G. Ryle
    The Concept of Mind (1949)

Using  regression  technology  as outlined above  is  essentially  how 
artificial neural network software is used to make classifications, in 
fact,  there  is  now substantial evidence to  suggest  that  the  two 
technologies are basically one and the same (Stone 1986), except  that 
in  neural  network technology, the regression  variable  weights  are 
opaque to the judge, cf. Kosko (1992):

    'These properties reduce to the single abstract property 
    of *adaptive model-free function estimation*:Intelligent 
    systems  adaptively estimate continuous  functions  from 
    data  without  specifying  mathematically  how   outputs 
    depend  on inputs...A function f, denoted f: X  Y,  maps 
    an  input  domain  X to an output  range  Y.  For  every 
    element x in the input domain X, the function f uniquely 
    assigns the element y to the output range Y..  Functions 
    define causal hypotheses. Science and engineering  paint 
    our pictures of the universe with functions.

    B. Kosko (1992)
    Neural  Networks and Fuzzy Systems: A Dynamical  Systems 
    Approach to Machine Intelligence p 19.

Today,  formal modelling of such intensional processes  is  researched 
using a  technology known as 'Neural Computing' which uses inferential 
statistical  technologies  closely  related  to  regression  analysis. 
However, such technologies are inherently inductive. They take samples 
and  generalise to populations. They are at best  pattern  recognition 

Such  technologies  must be contrasted with formal  deductive  logical 
systems  which  are  algorithmic rather  than  heuristic  (extensional 
rather than intensional). The algorithmic, or computational,  approach 
is central to classic Artificial Intelligence and is represented today 
by  the  technology  of  relational  databases  along  with  rule  and 
Knowledge Information Based System (KIBS) which are based on the First 
Order Predicate Calculus, the Robinson Resolution Principle  (Robinson 
1965,1979)  and the long term objectives of automated reasoning  (e.g. 
Wos et. al 1992).

David Longley (check end reply line #)

Longley Consulting                                                  London, UK
Behaviour Assessment & Profiling Technology,
Research, Data Analysis and Training Services,
Small IT Systems                      

More information about the Neur-sci mailing list