machine brains

David Longley David at longley.demon.co.uk
Thu Nov 13 12:55:13 EST 1997


In article <MPG.ed4ed5b4d7b67a398968a at news3.wt.net>
           solntsev at wt.net "Alexander Solntsev" writes:

> I want to make a couple of points.
> 
> First on the subject of neural nets: one should not try to extend the 
> capabilities of such computing device to its use as a model of a brain 
> simply on the basis both apply similar terminology. For it is exactly that 
> which some researchers (and others) appear to be doing. A neural net is an 
> interesting computing device that is both inartistically parallel and 
> distributed. Its operation is determined by multiple interconnected nodes, 
> not unlike that of a brain. Yet it is not the only device that exhibits 
> such properties. Actually, any system of multiple cooperating 
> interconnected nodes could look like a brain. We should avoid making 
> parallels between computing devices and brain unless we can reasonably show 
> such devices as explaining more than one aspect of brains operation.
> 
> Second on the subject of linearity: most of us have became trapped in the 
> Von Neumann world of sequential computation with only neural nets seen as 
> the way out. Well, thank goodness, there are additional inherently parallel 
> and distributed computing devices that could save the day. While there are 
> several such devices, I will name the Dataflow computers specifically for I 
> find them most usefull. Actually, a neural net could be considered a 
> special case of a Dataflow computer. What is a Dataflow computer? It is a 
> device composed on multiple cooperating interconnected nodes that represent 
> a dataflow graph of the computation they perform. Each node can have 
> multiple inputs and outputs connecting it to other nodes. Each node 
> represents a computing function based on those inputs, and that function is 
> executed when these inputs become available. All nodes exchange information 
> using messages that could represent complex data structures.
> 
> Just a couple of points (for now as this message is getting too long :-))
> 
> Alex

I'm  not  going  to apologise for length as to discuss  any  of  these 
issues  usefully requires books usually. The following is  an  extract 
from "Fragments" on the subject. I would tidy up my usage of inductive 
vs.  deductive a little today, but the major theme remains. Whilst  we 
might be able to model some of the adaptive processes which biological 
systems  seem to instantiate, *THIS IS NOT AI BUT PSYCHOLOGY. We  know 
that  such systems are good models of the heuristics which seem to  be 
part of simple and complex animals' repertoire, and my thesis is  that 
we  have,  largely  through collective social  efforts,  discovered  a 
technology (the extensional stance) which goes far beyond what any  of 
these  adaptive  systems  are capable of. That stance has  led  to  an 
extensive  technology  based on science, and it is THAT which  we  are 
really talking about when we bandy the work "Artificial  Intelligence" 
about. That's why just about every "AI" development so rapidly becomes 
just  one more aspect of our engineering - it *IS* just an example  of 
engineering (albeit often hyped through anthropomorphic metaphor). 

In  my view, the obstacle to this being more widely understood is  the 
muddle which currently pervades so much of contemporary psychology and 
cognitive  science. Whilst it may need some further work to  make  the 
point as clear as it should be, the seeds of this idea are outlined in 
"Fragments  of  Behaviour: The  Extensional  Stance"  (1993;1994;1997) 
which provides the theoretical background for how the philosophy of AI 
might be usefully applied in the field of "Corrections".

                                o o o

    'Connectionist  networks  are well  suited  to  everyday 
    common sense reasoning. Their ability to  simultaneously 
    satisfy  soft  constraints allows them  to  select  from 
    conflicting   information   in   finding   a   plausible 
    interpretation  of a situation. However, these  networks 
    are  poor at reasoning using the standard  semantics  of 
    classical logic, based on truth in all possible models.'

    M. Derthick (1990)
    Mundane Reasoning by Settling on a Plausible Model
    Artificial Intelligence 46,1990,107-157

Connectionist  systems, it is claimed, do not represent  knowledge  as 
production rules, ie as well-formed-formulae represented in the syntax 
of  the  predicate calculus (using conditionals, modus  ponens,  modus 
tollens  and  the  quantifiers), but  as  connection  weights  between 
activated predicates in a parallel distributed network:

    'Lawful  behavior  and judgments may be  produced  by  a 
    mechanism  in which there is no explicit  representation 
    of  the  rule. Instead, we suggest that  the  mechanisms 
    that   process   language   and   make   judgments    of 
    grammaticality are constructed in such a way that  their 
    performance  is characterizable by rules, but  that  the 
    rules  themselves  are  not  written  in  explicit  form 
    anywhere in the mechanism.'

    D E Rumelhart and D McClelland (1986)
    Parallel Distributed Processing Ch. 18

Such    systems   are   function-approximation   systems,   and    are 
mathematically  a development of Kolmogorov's Mapping  Neural  Network 
Existence  Theorem  (1957). Such networks consist of three  layers  of 
processing  elements. Those of the bottom layer simply distribute  the 
input  vector (a pattern of 1s and 0s) to the processing  elements  of 
the  second  layer. The processing elements of this middle  or  hidden 
layer implement a *'transfer function'* (more on this below). The  top 
layer are output units. 

An  important  feature  of Kolmogorov's Theorem, is  that  it  is  not 
constructive. That is, it is not algorithmic or 'effective'. Since the 
proof  of  the  theorem is not constructive, we do  not  know  how  to 
determine  the key quantities of the transfer functions.  The  theorem 
simply tells us that such a three layer mapping network must exist. As 
Hecht-Nielsen (1990) remarks:

    'Unfortunately,  there  does not appear to be  too  much 
    hope  that  a method of finding the  Kolmogorov  network 
    will  be developed soon. Thus, the value of this  result 
    is  its  intellectual assurance that  continuous  vector 
    mappings   of  a  vector  variable  on  the  unit   cube 
    (actually,  the theorem can be extended to apply to  any 
    COMPACT, ie, closed and bounded, set) can be implemented 
    EXACTLY with a three-layer neural network.'

    R. Hecht-Nielsen (1990)
    Kolmogorov's Theorem
    Neurocomputing

That is, we may well be able to find weight-matrices which capture  or 
embody certain functions, but we may not be able to say  'effectively' 
what  the  precise equations are which  algorithmically  compute  such 
functions.  This is often summarised by statements to the effect  that 
neural  networks  can model or fit solutions to sample  problems,  and 
generalise  to  new cases, but they can not provide a rule as  to  how 
they  make such classifications or inferences. Their ability to do  so 
is  distributed  across the weightings of the whole weight  matrix  of 
connections  between the three layers of the network. The above is  to 
be  contrasted  with the fitting of linear discriminant  functions  to 
partition  or  classify  an  N dimensional space  (N  being  a  direct 
function   of   the  number  of  classes  or   predicates).   Fisher's 
discriminant  analysis  (and  the  closely  related  linear   multiple 
regression   technology)   arrive   at   the   discriminant   function 
coefficients through the Gaussian method of Least Mean Squares, each b 
value  and the constant being arrived at deductively via the  solution 
of   simultaneous   equations.   Function   approximation,   or    the 
determination  of  hidden  layer weights or connections  is  based  on 
recursive feedback, elsewhere within behaviour science, this is  known 
as  'reinforcement',  the differential strengthening or  weakening  of 
connections  depending  on feedback or knowledge of  results.  Kohonen 
(1988)   commenting   on  "Connectionist  Models"   in   contrast   to 
conventional, extensionalist relational databases, writes:

    'Let  me make it completely clear that one of  the  most 
    central functions coveted by the "connectionist"  models 
    is the ability to solve *simplicitly defined  relational 
    structures*.  The latter, as explained in  Sect.  1.4.5, 
    are  defined  by  *partial relations*,  from  which  the 
    structures are determined in a very much similar way  as 
    solutions to systems of algebraic equations are  formed; 
    all  the  values  in the  universe  of  variables  which 
    satisfy  the  conditions  expressed  as  the   equations 
    comprise, by definition, the possible solutions. In  the 
    relational    structures,   the    knowledge    (partial 
    statements,   partial   relations)  stored   in   memory 
    constitutes  the universe of variables, from  which  the 
    solutions  must be sought; and the conditions  expressed 
    by  (eventually incomplete) relations, ie, the  "control 
    structure" [9.20] correspond to the equations.

    Contrary  to  the conventional database  machines  which 
    also  have  been  designed  to  handle  such  relational 
    structures, the "connectionist" models are said to  take 
    the relations, or actually their strengths into  account 
    statistically. In so doing, however they only apply  the 
    Euclidean  metric, or the least square loss function  to 
    optimize   the  solution.  This  is  not  a  very   good 
    assumption for natural data.'

    T. Kohonen (1988)
    Ch. 9 Notes on Neural Computing
    In Self-Organisation and Associative Memory
     
Throughout  the  1970s  Nisbett  and colleagues  studied  the  use  of 
probabilistic   heuristics  in  real  world  human  problem   solving, 
primarily in the context of Attribution Theory (H. Kelley 1967, 1972). 
Such  inductive  as opposed to deductive heuristics  of  inference  do 
indeed  seem  to be influenced by training (Nisbett and  Krantz  1983, 
Nisbett et. al 1987). Statistical heuristics are naturally applied  in 
everyday  reasoning  if  subjects  are trained in  the  Law  of  Large 
Numbers. This is not surprising, since application of such  heuristics 
is an example of response generalisation - which is how  psychologists 
have  traditionally  studied the vicissitudes of  inductive  inference 
within  Learning  Theory.  As Wagner (1981) has pointed  out,  we  are 
perfectly  at liberty to use the language of Attribution Theory as  an 
alternative,  this  exchangeability  of  reference  system  being   an 
instance of Quinean Ontological Relativity, where what matters is  not 
so  much  the  names in argument positions,  or  even  the  predicates 
themselves,  but  the  *relations*  (themselves  at  least   two-place 
predicates) which emerge from such systems.

Under  most natural circumstances, inductive inference  is  irrational 
(cf.  Popper 1936, Kahneman et al. 1982, Dawes, Faust and Meehl  1989, 
Sutherland   1992).  This  is  because  it  is  generally   based   on 
unrepresentative   sampling   (drawing  on  the   'availability'   and 
'representativeness'  heuristics), and this is so simply because  that 
is  how data in a structured culture often naturally presents  itself. 
Research has therefore demonstrated that human inference is  seriously 
at  odds with formal deductive logical reasoning, and the  algorithmic 
implementation  of  those inferential processes by  computers  (Church 
1936, Post 1936, Turing 1936). One of the main points of this paper is 
that  we  generally  turn  to  the  formal  deductive  technology   of 
mathematico-logical method (science) to compensate for the  heuristics 
and  biases which typically characterise natural inductive  inference. 
Where possible, we turn to *relational databases and 4GLs*  (recursive 
function  theory and mathematical logic) to provide  descriptive,  and 
deductively valid pictures of individuals and collectives.

This  large, and unexpected body of empirical evidence from  decision-
theory, cognitive experimental social psychology and Learning  Theory, 
began accumulating in the mid to late 1970s (cf. Kahneman, Tversky and 
Slovic 1982, Putnam 1986, Stich 1990), and began to cast serious doubt 
on  the  viability  of  the  'computational  theory'  of  mind  (Fodor 
1975,1980)  which was basic to functionalism (Putnam 1986).  That  is, 
the  substantial body of empirical evidence which  accumulated  within 
Cognitive  Psychology itself suggested that, contrary to the  doctrine 
of  functionalism,  there exists a system  of  independent,  objective 
knowledge,  and reasoning against which we can judge human, and  other 
animal cognitive processing. However, it gradually became  appreciated 
that  the  digital computer is not a good model of  human  information 
processing, at least not unless this is conceived in terms of  'neural 
computing'  (also  known as 'connectionism' or  'Parallel  Distributed 
Processing). The application of formal rules of logic and  mathematics 
to  the  analysis of behaviour solely within the  language  of  formal 
logic  is the professional business of Applied  Behaviour  Scientists. 
Outside  of the practice of those professional skills,  the  scientist 
himself is as prone to the irrationality of intensional heuristics  as 
are laymen (Wason 1966). Within the domain of formal logic applied  to 
the  analysis of behaviour, the work undertaken by applied  scientists 
is impersonal. The scientists' professional views are dictated by  the 
laws   of   logic  and  mathematics  rather  than   personal   opinion 
(heuristics).

The alternative, intensional heuristics, which are the mark of natural 
human  judgement  (hence  our rich folk  psychological  vocabulary  of 
metaphor)  have  to  be  contrasted  with  extensional  analysis   and 
judgement  using technology based on the deductive algorithms  of  the 
First Order Predicate Calculus (Relational Database Technology).  This 
is  not  only  coextensive with the 'scope and  language  of  science' 
(Quine  1954) but is also, to the best of our knowledge from  research 
in  Cognitive  Psychology,  an effective compensatory  system  to  the 
biases of natural intensional, inductive heuristics (Agnoli and Krantz 
1989). Whilst a considerable amount of evidence suggests that training 
in formal logic and statistics is not in itself sufficient to suppress 
usage  of  intensional  heuristics  in any  enduring  sense,  ie  that 
generalisation  to  extra-training  contexts  is  limited,  there   is 
evidence  that judgement can be rendered more rational by training  in 
the  use of extensional technology. The demonstration by Kahneman  and 
Tversky  1983, that subjects generally fail to apply  the  extensional 
conjunction rule in probability that conjunctions are always equal  or 
less  probable  than  its elements, and that  this  too  is  generally 
resistant  to counter-training, is another example, this  time  within 
probability theory (a deductive system) of the failure of  extensional 
rules  in  applied  contexts. Careful use of I.T.  and  principles  of 
deductive  inference  (e.g. semantic tableaux,  Herbrand  models,  and 
Resolution  methods)  promise, within the limits  imposed  by  Godel's 
Theorem,  to  keep us on track if we restrict our  technology  to  the 
extensional.

The reawakening of interest in connectionism in the early to mid 1980s 
can  indeed  be  seen  as a vindication of  the  basic  principles  of 
behaviourism. What is psychological may well be impenetrable, for  any 
serious scientific purposes, not because it is in any way a  different 
kind  of 'stuff', but because structurally it amounts to no more  than 
an n-dimensional weight space, idiosyncratic and context specific,  to 
each and every one of us. 

    'Uncertain situations may be thought of as  disjunctions 
    of  possible  states: either one state will  obtain,  or 
    another....

    Shortcomings in reasoning have typically been attributed 
    to   quantitative   limitations  of  human   beings   as 
    processors of information. "Hard problems" are typically 
    characterized  by reference to the "amount of  knowledge 
    required," the "memory load," or the "size of the search 
    space"....Such limitations, however, are not  sufficient 
    to account for all that is difficult about thinking.  In 
    contrast  to many complicated tasks that people  perform 
    with  relative ease, the problems investigated  in  this 
    paper  are  computationally  very  simple,  involving  a 
    single  disjunction  of  two well  defined  states.  The 
    present   studies  highlight  the  discrepancy   between 
    logical  complexity  on the one hand  and  psychological 
    difficulty  on  the  other. In contrast  to  the  "frame 
    problem"  for example, which is trivial for  people  but 
    exceedingly  difficult  for  AI, the  task  of  thinking 
    through disjunctions is trivial for AI (which  routinely 
    implements "tree search" and "path finding"  algorithms) 
    but  very  difficult for people. The failure  to  reason 
    consequentially may constitute a fundamental  difference 
    between natural and artificial intelligence.'

    E. Shafir and A. Tversky (1992)
    Thinking through Uncertainty: Nonconsequantial Reasoning 
    and Choice
    Cognitive Psychology 24,449-474



More information about the Neur-sci mailing list