In article <MPG.ed4ed5b4d7b67a398968a at news3.wt.net>
solntsev at wt.net "Alexander Solntsev" writes:
> I want to make a couple of points.
>> First on the subject of neural nets: one should not try to extend the
> capabilities of such computing device to its use as a model of a brain
> simply on the basis both apply similar terminology. For it is exactly that
> which some researchers (and others) appear to be doing. A neural net is an
> interesting computing device that is both inartistically parallel and
> distributed. Its operation is determined by multiple interconnected nodes,
> not unlike that of a brain. Yet it is not the only device that exhibits
> such properties. Actually, any system of multiple cooperating
> interconnected nodes could look like a brain. We should avoid making
> parallels between computing devices and brain unless we can reasonably show
> such devices as explaining more than one aspect of brains operation.
>> Second on the subject of linearity: most of us have became trapped in the
> Von Neumann world of sequential computation with only neural nets seen as
> the way out. Well, thank goodness, there are additional inherently parallel
> and distributed computing devices that could save the day. While there are
> several such devices, I will name the Dataflow computers specifically for I
> find them most usefull. Actually, a neural net could be considered a
> special case of a Dataflow computer. What is a Dataflow computer? It is a
> device composed on multiple cooperating interconnected nodes that represent
> a dataflow graph of the computation they perform. Each node can have
> multiple inputs and outputs connecting it to other nodes. Each node
> represents a computing function based on those inputs, and that function is
> executed when these inputs become available. All nodes exchange information
> using messages that could represent complex data structures.
>> Just a couple of points (for now as this message is getting too long :-))
>> Alex
I'm not going to apologise for length as to discuss any of these
issues usefully requires books usually. The following is an extract
from "Fragments" on the subject. I would tidy up my usage of inductive
vs. deductive a little today, but the major theme remains. Whilst we
might be able to model some of the adaptive processes which biological
systems seem to instantiate, *THIS IS NOT AI BUT PSYCHOLOGY. We know
that such systems are good models of the heuristics which seem to be
part of simple and complex animals' repertoire, and my thesis is that
we have, largely through collective social efforts, discovered a
technology (the extensional stance) which goes far beyond what any of
these adaptive systems are capable of. That stance has led to an
extensive technology based on science, and it is THAT which we are
really talking about when we bandy the work "Artificial Intelligence"
about. That's why just about every "AI" development so rapidly becomes
just one more aspect of our engineering - it *IS* just an example of
engineering (albeit often hyped through anthropomorphic metaphor).
In my view, the obstacle to this being more widely understood is the
muddle which currently pervades so much of contemporary psychology and
cognitive science. Whilst it may need some further work to make the
point as clear as it should be, the seeds of this idea are outlined in
"Fragments of Behaviour: The Extensional Stance" (1993;1994;1997)
which provides the theoretical background for how the philosophy of AI
might be usefully applied in the field of "Corrections".
o o o
'Connectionist networks are well suited to everyday
common sense reasoning. Their ability to simultaneously
satisfy soft constraints allows them to select from
conflicting information in finding a plausible
interpretation of a situation. However, these networks
are poor at reasoning using the standard semantics of
classical logic, based on truth in all possible models.'
M. Derthick (1990)
Mundane Reasoning by Settling on a Plausible Model
Artificial Intelligence 46,1990,107-157
Connectionist systems, it is claimed, do not represent knowledge as
production rules, ie as well-formed-formulae represented in the syntax
of the predicate calculus (using conditionals, modus ponens, modus
tollens and the quantifiers), but as connection weights between
activated predicates in a parallel distributed network:
'Lawful behavior and judgments may be produced by a
mechanism in which there is no explicit representation
of the rule. Instead, we suggest that the mechanisms
that process language and make judgments of
grammaticality are constructed in such a way that their
performance is characterizable by rules, but that the
rules themselves are not written in explicit form
anywhere in the mechanism.'
D E Rumelhart and D McClelland (1986)
Parallel Distributed Processing Ch. 18
Such systems are function-approximation systems, and are
mathematically a development of Kolmogorov's Mapping Neural Network
Existence Theorem (1957). Such networks consist of three layers of
processing elements. Those of the bottom layer simply distribute the
input vector (a pattern of 1s and 0s) to the processing elements of
the second layer. The processing elements of this middle or hidden
layer implement a *'transfer function'* (more on this below). The top
layer are output units.
An important feature of Kolmogorov's Theorem, is that it is not
constructive. That is, it is not algorithmic or 'effective'. Since the
proof of the theorem is not constructive, we do not know how to
determine the key quantities of the transfer functions. The theorem
simply tells us that such a three layer mapping network must exist. As
Hecht-Nielsen (1990) remarks:
'Unfortunately, there does not appear to be too much
hope that a method of finding the Kolmogorov network
will be developed soon. Thus, the value of this result
is its intellectual assurance that continuous vector
mappings of a vector variable on the unit cube
(actually, the theorem can be extended to apply to any
COMPACT, ie, closed and bounded, set) can be implemented
EXACTLY with a three-layer neural network.'
R. Hecht-Nielsen (1990)
Kolmogorov's Theorem
Neurocomputing
That is, we may well be able to find weight-matrices which capture or
embody certain functions, but we may not be able to say 'effectively'
what the precise equations are which algorithmically compute such
functions. This is often summarised by statements to the effect that
neural networks can model or fit solutions to sample problems, and
generalise to new cases, but they can not provide a rule as to how
they make such classifications or inferences. Their ability to do so
is distributed across the weightings of the whole weight matrix of
connections between the three layers of the network. The above is to
be contrasted with the fitting of linear discriminant functions to
partition or classify an N dimensional space (N being a direct
function of the number of classes or predicates). Fisher's
discriminant analysis (and the closely related linear multiple
regression technology) arrive at the discriminant function
coefficients through the Gaussian method of Least Mean Squares, each b
value and the constant being arrived at deductively via the solution
of simultaneous equations. Function approximation, or the
determination of hidden layer weights or connections is based on
recursive feedback, elsewhere within behaviour science, this is known
as 'reinforcement', the differential strengthening or weakening of
connections depending on feedback or knowledge of results. Kohonen
(1988) commenting on "Connectionist Models" in contrast to
conventional, extensionalist relational databases, writes:
'Let me make it completely clear that one of the most
central functions coveted by the "connectionist" models
is the ability to solve *simplicitly defined relational
structures*. The latter, as explained in Sect. 1.4.5,
are defined by *partial relations*, from which the
structures are determined in a very much similar way as
solutions to systems of algebraic equations are formed;
all the values in the universe of variables which
satisfy the conditions expressed as the equations
comprise, by definition, the possible solutions. In the
relational structures, the knowledge (partial
statements, partial relations) stored in memory
constitutes the universe of variables, from which the
solutions must be sought; and the conditions expressed
by (eventually incomplete) relations, ie, the "control
structure" [9.20] correspond to the equations.
Contrary to the conventional database machines which
also have been designed to handle such relational
structures, the "connectionist" models are said to take
the relations, or actually their strengths into account
statistically. In so doing, however they only apply the
Euclidean metric, or the least square loss function to
optimize the solution. This is not a very good
assumption for natural data.'
T. Kohonen (1988)
Ch. 9 Notes on Neural Computing
In Self-Organisation and Associative Memory
Throughout the 1970s Nisbett and colleagues studied the use of
probabilistic heuristics in real world human problem solving,
primarily in the context of Attribution Theory (H. Kelley 1967, 1972).
Such inductive as opposed to deductive heuristics of inference do
indeed seem to be influenced by training (Nisbett and Krantz 1983,
Nisbett et. al 1987). Statistical heuristics are naturally applied in
everyday reasoning if subjects are trained in the Law of Large
Numbers. This is not surprising, since application of such heuristics
is an example of response generalisation - which is how psychologists
have traditionally studied the vicissitudes of inductive inference
within Learning Theory. As Wagner (1981) has pointed out, we are
perfectly at liberty to use the language of Attribution Theory as an
alternative, this exchangeability of reference system being an
instance of Quinean Ontological Relativity, where what matters is not
so much the names in argument positions, or even the predicates
themselves, but the *relations* (themselves at least two-place
predicates) which emerge from such systems.
Under most natural circumstances, inductive inference is irrational
(cf. Popper 1936, Kahneman et al. 1982, Dawes, Faust and Meehl 1989,
Sutherland 1992). This is because it is generally based on
unrepresentative sampling (drawing on the 'availability' and
'representativeness' heuristics), and this is so simply because that
is how data in a structured culture often naturally presents itself.
Research has therefore demonstrated that human inference is seriously
at odds with formal deductive logical reasoning, and the algorithmic
implementation of those inferential processes by computers (Church
1936, Post 1936, Turing 1936). One of the main points of this paper is
that we generally turn to the formal deductive technology of
mathematico-logical method (science) to compensate for the heuristics
and biases which typically characterise natural inductive inference.
Where possible, we turn to *relational databases and 4GLs* (recursive
function theory and mathematical logic) to provide descriptive, and
deductively valid pictures of individuals and collectives.
This large, and unexpected body of empirical evidence from decision-
theory, cognitive experimental social psychology and Learning Theory,
began accumulating in the mid to late 1970s (cf. Kahneman, Tversky and
Slovic 1982, Putnam 1986, Stich 1990), and began to cast serious doubt
on the viability of the 'computational theory' of mind (Fodor
1975,1980) which was basic to functionalism (Putnam 1986). That is,
the substantial body of empirical evidence which accumulated within
Cognitive Psychology itself suggested that, contrary to the doctrine
of functionalism, there exists a system of independent, objective
knowledge, and reasoning against which we can judge human, and other
animal cognitive processing. However, it gradually became appreciated
that the digital computer is not a good model of human information
processing, at least not unless this is conceived in terms of 'neural
computing' (also known as 'connectionism' or 'Parallel Distributed
Processing). The application of formal rules of logic and mathematics
to the analysis of behaviour solely within the language of formal
logic is the professional business of Applied Behaviour Scientists.
Outside of the practice of those professional skills, the scientist
himself is as prone to the irrationality of intensional heuristics as
are laymen (Wason 1966). Within the domain of formal logic applied to
the analysis of behaviour, the work undertaken by applied scientists
is impersonal. The scientists' professional views are dictated by the
laws of logic and mathematics rather than personal opinion
(heuristics).
The alternative, intensional heuristics, which are the mark of natural
human judgement (hence our rich folk psychological vocabulary of
metaphor) have to be contrasted with extensional analysis and
judgement using technology based on the deductive algorithms of the
First Order Predicate Calculus (Relational Database Technology). This
is not only coextensive with the 'scope and language of science'
(Quine 1954) but is also, to the best of our knowledge from research
in Cognitive Psychology, an effective compensatory system to the
biases of natural intensional, inductive heuristics (Agnoli and Krantz
1989). Whilst a considerable amount of evidence suggests that training
in formal logic and statistics is not in itself sufficient to suppress
usage of intensional heuristics in any enduring sense, ie that
generalisation to extra-training contexts is limited, there is
evidence that judgement can be rendered more rational by training in
the use of extensional technology. The demonstration by Kahneman and
Tversky 1983, that subjects generally fail to apply the extensional
conjunction rule in probability that conjunctions are always equal or
less probable than its elements, and that this too is generally
resistant to counter-training, is another example, this time within
probability theory (a deductive system) of the failure of extensional
rules in applied contexts. Careful use of I.T. and principles of
deductive inference (e.g. semantic tableaux, Herbrand models, and
Resolution methods) promise, within the limits imposed by Godel's
Theorem, to keep us on track if we restrict our technology to the
extensional.
The reawakening of interest in connectionism in the early to mid 1980s
can indeed be seen as a vindication of the basic principles of
behaviourism. What is psychological may well be impenetrable, for any
serious scientific purposes, not because it is in any way a different
kind of 'stuff', but because structurally it amounts to no more than
an n-dimensional weight space, idiosyncratic and context specific, to
each and every one of us.
'Uncertain situations may be thought of as disjunctions
of possible states: either one state will obtain, or
another....
Shortcomings in reasoning have typically been attributed
to quantitative limitations of human beings as
processors of information. "Hard problems" are typically
characterized by reference to the "amount of knowledge
required," the "memory load," or the "size of the search
space"....Such limitations, however, are not sufficient
to account for all that is difficult about thinking. In
contrast to many complicated tasks that people perform
with relative ease, the problems investigated in this
paper are computationally very simple, involving a
single disjunction of two well defined states. The
present studies highlight the discrepancy between
logical complexity on the one hand and psychological
difficulty on the other. In contrast to the "frame
problem" for example, which is trivial for people but
exceedingly difficult for AI, the task of thinking
through disjunctions is trivial for AI (which routinely
implements "tree search" and "path finding" algorithms)
but very difficult for people. The failure to reason
consequentially may constitute a fundamental difference
between natural and artificial intelligence.'
E. Shafir and A. Tversky (1992)
Thinking through Uncertainty: Nonconsequantial Reasoning
and Choice
Cognitive Psychology 24,449-474