No subject


Sun Apr 10 21:17:45 EST 2005


algorithms definitely need to design and train nets on their own. (By
the way, this is indeed a doable task and all neural network algorithms
need to focus on both of these tasks.) We cannot leave design out of
our algorithms. Our original intent is to build self-learning systems, not
systems that we have to "baby sit" all the time. Such systems are
"useless" if we want to build truly autonomous learning systems that can
learn own their own. "Learning" includes "design and training". We
cannot call them learning algorithms unless they design nets on their
own and unless they attempt to generalize (i.e. attempt to build the
smallest possible net).
 
I would welcome more thoughts and debate on all of these issues. It
would help to see some more response on two of the other premises of
classical connectionist learning - local learning and memoryless learning.
They have been the key concepts behind algorithm development in this
field for the last 40 to 50 years. Again, open and vigorous debate is
very healthy for a scientific field. Perhaps more researchers will come
forward with facts and ideas on all these two and other issues.
********************************************************
********************************************************
On May 23 Danny Silver wrote:
 
"Dr. Roy ..
 It was interesting to read your mail on new criteria for neural network
based inductive learning.  I am sure that many other readers have at
one time or another had similar thoughts or portions thereof.
 Notwithstanding the need to walk before you run, there is reason to
set our sights a little higher then they have been.
 Along these lines I would like to point you toward a growing body of
work on Transfer in Inductive Systems which suggests that a "life long
learning" or "learning to learn" approach encomposes much of the
criteria which you have outlined. At NIPS*95 a post-conference
workshop covered this very topic and heard from some 15 speakers on
the subject. All those who are interested should search through the
hompages below for additional information."
Daniel L. Silver    University of Western Ontario, London, Canada    =
N6A 3K7 - Dept. of Comp. Sci. - Office: MC27b    =
dsilver at csd.uwo.ca  H: (519)473-6168   O: (519)679-2111 (ext.6903)
WWW home page ....  http://www.csd.uwo.ca/~dsilver                   =
==================================================
 Workshop page:
 http://www.cs.cmu.edu/afs/cs.cmu.edu/usr/caruana/pub/transfer.html
 Lori Pratt's transfer page:
 http://vita.mines.colorado.edu:3857/0/lpratt/transfer.html
 Danny Silver's transfer ref list:
 http://www.csd.uwo.ca/~dsilver/ltl-ref-list
 Rich Caruana's transfer ref list:
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/caruana/pub/transferbib.html
********************************************************
********************************************************
On May 21 Michael Vanier wrote:
 
"I read your post to the computational neuroscience mailing list with
interest.  I agreed with most of your points about the differences
between "brain-like" learning and the learning exhibited by current
neural network models.  I have a couple of comments, for what it's
worth.
 
(On Task A: Perform Network Design Task)
 
As a student of neuroscience (and computational neuroscience), it isn't
clear to me what you're referring to when you say that the brain
designs an appropriate network for a given task.  One take on this is
that evolution has done just that, but evolution has operated over
millions of years. Biological development can also presumably tune a
network in response to inputs (e.g. the development of connectivity in
visual cortex in response to the presence or absence of visual stimuli),
but again, this is slow and relatively fixed after a certain period, so it
would only apply to generic tasks whose nature doesn't change
profoundly over time (which presumably is the case for early vision).  I
know of no example where the brain can massively rewire itself in
order to perform some task.  However, the kind of learning found in
connectionist networks (correlation-based using local learning rules)
has a fairly direct analogy to long-term potentiation and depression in
the brain, so it's likely that the brain is at least this powerful.  This
accounts for much of the appeal of local learning rules: you can find
them (or something similar to them) in the brain.  In fact,despite the
practical problems with backprop (which you mention), the most
common objection given by biologists to backprop is that even this
simple a learning rule would be very difficult to instantiate in a
biological system.
 
(On Task C: Quickness in Learning)
 
This is indeed a problem.  Interestingly, attractor networks such as the
Hopfield net can in principle learn in one trial (although there are other
problems involved there too).  Hopfield nets are also fundamentally
feedback structures, like the brain but unlike most connectionist
models. This is not to suggest that Hopfield nets are good models of
the brain; they clearly aren't.
 
It's not clear to me what you mean by "storing training examples in
memory".  Again using the Hopfield net example, in that case the
whole purpose of the network is to store patterns in memory.  Perhaps
what you're suggesting is that feedforward networks take advantage of
this to repeatedly play back memorized patterns from attractor
networks so as to make learning more rapid.  Some researchers believe
the hippocampus is performing this function by storing patterns when
an animal is awake and playing them back when the animal is asleep.
 
Thanks for an interesting post."
********************************************************
********************************************************
On May 15 Brendan McCane wrote:
 
" Hi, Just a few comments here. Although I think the points you make
are valid and probably desirable, I don't think they can necessarily be
applied to the human brain. Following are specific comments about the
listed criteria.
 
(On Task A: Design Networks)
 
 The neural network architecture of the brain is largely pre-determined.
 Tuning certainly takes place, but I do not believe that the entire brain
 architecture is rebuilt for every newborn. This would require
tremendous effort and probably end up with people who cannot
communicate with each other at all (due to different representations).
The human brain system has actually been created with external
assistance, namely from evolution.
 
 (On Task B: Robustness in Learning)
 
 I agree that no local-minima would be optimal, but humans almost
 certainly fall into local minima (due to lack of extensive input or
 whatever) and only jump out when new input comes to light.
 
(On Task E: Efficiency in Learning.)
 
 I don't see why insects or birds could not solve NP-hard problems
from an evolutionary point of view. That is, the solution has now been
 hard-wired into their brains after millions of years of learning.
 
 I am not convinced that these characteristics are more brain-like than
 classical connectionist ones. Certainly they are desirable, and are
 possibly the holy grail of learning, but I don't think you can make the
 claim that the brain functions in this way.
 
 
 I think I've addressed all the other points made below in the points
 above."
********************************************************
********************************************************
On May 15 Richard Kenyon wrote:
 
 "Here are my comments.
I think that what you are looking for is something along the lines of
 a-life type networks which would evolve their design (much like the
 brain, see Brendans comment), as there is no obvious design for any
particular problem in the first place, and a net which can design a
 network must already know something about the problem, which is
why you raise the issue. I think though that design is evolving albeit at
the hands of connectionist scienctists, i.e the title of this list is one
 such step in the evolution.
 
(On Task B: Robustness in Learning)
 
 For me one of the key concepts in neural nets is graceful degradation,
the idea that when problems arise the networks don;t just fall over. I
 reckon that networks are still fairly brittle and that a lot needs ot be
 done in this area. However i agree again with Brendan that our brains
 suffer local minima more than we would like to admit.
 
 (On Task C: Quickness in Learning)
 
Memory is indeed very important, but recurrent neural networks have
 published a lot on the storage capacity of such devices already, it has
 not been forgotten. Very idealistic i'm afraid. Humans don't learn as
quickly as we might like to think. Our 'education' is a long drawn out
process and only every now and again do we experience enlightenment
in the grasping of a key concept. This does not happen quickly or that
often (relatively). The main factor affecting neural nets (imo) will be
parallel computers at which point the net as we know it will not be
stand alone but connected to many more, this is the principle i think is
the closest theorisation we have to the brains parallelism. This is also
why hybrid systems are v  interesting, as a parallel system will be able
to process output from mnay designs.
 
 (On Task D: Efficiency in Learning)
 
 Obviously efficiency in learning is important, but for us humans this is
 often mediated by efficent teaching, as in the case of some algorithms,
 self organising nets offer some form of autonamy in learning, but
often end up doing it the same way over and over again, as do we.
Kohonen has interpreted this as a physiological principle, in that it
takes a lot of effort to sever old neural connections and etablish a new
path for incorporating new ideas. Local minima have muscle.
 
(On Task E: Generalization in Learning)
 
 The brain probably accepts some form of redundancy (waste).
 I agree that the brain is one hell of an optimisation machine.
 Intelligence whatever task it may be applied to is (again imho) one
long optimisation process. Generalisation arises (even emerges or is a
side effect) as a result of ongoing optimisation, conglomeration,
reprocessing etc etc. This is again very important i agree, but i think (i
do anyway) we in NN commumnity are aware of this as with much of
the above. I thought that apart from point A we were doing all of this
already, although to have it explicitly published is very valuable.
 
 I may be wrong
 
 > A good test for a so-called "brain-like" algorithm is to imagine it
 > actually being part of a human brain.
 
 I don't think that many researchers would claim too much about
neural nets being very brain like at all. The simulated neurons, whether
sigmoid or tansigmoid etc, do not behave very like real neurons at all,
which is why there is a lot of research into biologically plauysible
neurons.
 
> Then examine the learning
 > phenomenon of the algorithm and compare it with that of the
 > human's. For example, pose the following question: If an algorithm
 > like back propagation is "planted" in the brain, how will it behave?
 > Will it be similar to human behavior in every way? Look at the
 > following simple "model/algorithm" phenomenon when the back-
 > propagation algorithm is "fitted" to a human brain. You give it a
 > few learning examples for a simple problem and after a while this
 > "back prop fitted" brain says: "I am stuck in a local minimum. I
 > need to relearn this problem. Start over again." And you ask:
 > "Which examples should I go over again?" And this "back prop
> fitted" brain replies: "You need to go over all of them.
 
 I agree this is limitation, but how is any net supposed ot know what is
 relevant to remember or even pay greater attention to. This is in part
 the frame problem which roboticists are having a great deal of fun
 discussing.
 
> I don't
 > remember anything you told me." So you go over the teaching
 > examples again. And let's say it gets stuck in a local minimum again
 > and, as usual, does not remember any of the past examples. So you
 > provide the teaching examples again and this process is repeated a
 > few times until it learns properly. The obvious questions are as
 > follows: Is "not remembering" any of the learning examples a brain-
 > like phenomenon?
 
 yes and no, children often need to be told over and over again, and
this fielkd is still in its infancy.
 
>Are the interactions with this so-called "brain-
  > like" algorithm similar to what one would actually encounter with a
 > human in a similar situation? If the interactions are not similar, then
 > the algorithm is not brain-like. A so-called brain-like algorithm's
 > interactions with the external world/teacher cannot be different
 > from that of the human.
 >
 > In the context of this example, it should be noted that
 > storing/remembering relevant facts and examples is very much a
 > natural part of the human learning process. Without the ability to
 > store and recall facts/information and discuss, compare and argue
 > about them, our ability to learn would be in serious jeopardy.
 > Information storage facilitates mental comparison of facts and
 > information and is an integral part of rapid and efficient learning. It
 > is not biologically justified when "brain-like" algorithms disallow
 > usage of memory to store relevant information.
 
 I did not know they were not allowed, but perhapos they have been
left on the sidelines, but again i refer you to recurrent nets.
 
 > Another typical phenomenon of classical connectionist learning is
 > the "external tweaking" of algorithms. How many times do we
 > "externally tweak" the brain (e.g. adjust the net, try a different
 > parameter setting) for it to learn? Interactions with a brain-like
 > algorithm has to be brain-like indeed in all respect.
 
 An analogy here is perhaps taking a different perspective on a
problem, this is a very human parameter that we must tweak to make
progress.
 
 > It is perhaps time to reexamine the foundations of the neural
 > network/connectionist field. This mailing list/newsletter provides an
 > excellent opportunity for participation by all concerned throughout
 > the world. I am looking forward to a lively debate on these matters.
 > That is how a scientific field makes real progress.
 
 i agree with the last sentiment."
********************************************************
********************************************************
On May 16 Chris Cleirigh wrote:
 
 "hi
 good luck with your enterprise, i think if you aim to be consistent with
 biology you have more chance of long term success.
 
 i'm no engineer -- i'm a linguist -- but i've read of Edelman's theory of
 neuronal group selection which seeks to explain categorisation
through darwinian processes of variation and selection of populations
of neuronal groups in the brain. are you motivated by such models.
 
 one thing, you say:
 
 For neuroscientists and neuroengineers, it
 should open the door to development of brain-like systems they
 have always wanted - those that can learn on their own without any
 external intervention or assistance, much like the brain.
 however, efficient learning does involve external intervention,
especially by other brains. consider language learning and the
corrective role played by adults in teaching children."
********************************************************
********************************************************
On May 17 Kevin Gurney wrote:
 
" I read your (provocative) posting to the cogpsy mailing list and
would like to make some comments
(Your original remarks are enclosed in square brackets)
 
 YA. Perform Network Design Task: A neural network/connectionist
 learning method must be able to design an appropriate network for
 a given problem,...From a neuroengineering and neuroscience point of
view, this is an essential property for any "stand-alone" learning system
-.."
 
 It might be from a neuroengineering point of view but not from a
neurscientific one. Real brains undergo a developmental process, much
of which is encoded in the organism's DNA. Thus, the basic
mechanisms of  structural and trophic development are not thought to
be activity driven per se. Mechansims like Long Term Potentiation
(LTP) may be the biological correlate of connectionist learning (Hebb
rule) but are not responsible for the overall neural architecture at the
modular level which includes the complex layering of the cortex.
 
 I would take issue quite generally with your frequent invocation of th
 eneuroscientists in your programme. They *are* definitley interested
in discovering the nature of real brains - rather than super-efficient
networks hat may be engineered -  will bring this out in subsequent
points below
 
 YB.  Robustness in Learning: The method must be robust so as
 not to have the local minima problem, the problems of oscillation
 and catastrophic forgetting, the problem of recall or lost memories
 and similar learning difficulties."
 
 Again, it may be the goal of neuro*engineers* to study ideal devices -
it is not the domain of neuroscientists.
 
 YC.  Quickness in Learning: The method must be quick in its
 learning and learn rapidly from only a few examples, much as
 humans do. "
 
 Humans don't, in fact, learn from just a few examples in most
cognitive and perceptual tasks  - this is a myth. The fine tuning of
visual and motor cortex which is a result of the critical period in
infanthood is a result of a continuous bombardment of the animal with
stimuli and tactile feedback. The same goes for langauge. The same
applies for the learning of any new skill in fact (reading, playing a
musical instrument ec etc.). These may be executed in an algorithmic,
serial processing fashion until they become automatised in the
 parallel processing of the brain (cf Andy Clarke's von-Neuman
emulaton by the brain)
 
 Many connectionists have imbued humans with god-like powers
which aren't there. It is true that we can learn one-off facts and add
them to our episodic memory but this is not usually the kind of things
which nets are asked to perform.
 
 YD.  Efficiency in Learning: The method must be
 computationally efficient in its learning when provided with a finite
 number of training examples (Minsky and PapertY1988"). It must be
 able to both design and train an appropriate net in polynomial time."
 
 Judd has shown that NN learning is intrinsically NP complex in many
instances - there is no `free lunch'. See also the results in
computational learning theory by Wolpert and Schaffer.
 
 YE.  Generalization in Learning: ...That is, it must try to design the
 smallest possible net, ... This property is based on the
 notion that the brain could not be wasteful of its limited resources,
 so it must be trying to design the smallest possible net for every
 task."
 
 Not true. Visual cortex uses a massive expansion in its coding from
the LGN to V1 before it `recompresses' in higher visual centres. This
has been described theoretically in terms of PCA etc (ECVP last year -
can't recall ref. just now)
 
 YAs far as I know, there is no biological evidence for any of the
 premises of classical connectionist learning."
 The relation LTP = Hebb rule is a fairly non-contentious statement in
the neuroscientific community.
 
 I could go on (RP learning and operant conditioning etc)...
 
  YSo, who should construct the net for a neural net
 algorithm? The answer again is very simple: Who else, but the
  algorithm itself!"
 
 The brain uses many `algorithms' to develop - it is these working in
concert (genetically deterimined and activity mediated) which ensure
the final state
 
 YYou give it a
 few learning examples for a simple problem and after a while this
 "back prop fitted" brain says: "I am stuck in a local minimum. I
 need to relearn this problem. Start over again.""
 
 My brain constantly gets stuck in local minima. If not then I would
learn everything I tried to do to perfection - I would be an
accomplished craftsman/musician/linguist/sporstman etc. In fact I am
non of these...but rather have a small amount  (local minimum's worth)
of ability in each.
 
 YThe obvious questions are as
 follows: Is "not remembering" any of the learning examples a brain-
 like phenomenon? "
 
 There may be some mechanism for storing the `rules' and `examples'
in STM or even LTM but even this is not certain (e.g. `now describe
to me the perfect tennis backhand......`No - you forget to mention the
follow-through - how many more times...')
 
 Finally, an engineering point. The claim that previous connectionist
algorithms are not able to construct networks is a little brash. There
have been several attempts to contruct nets as part of the learning
proces (e.g. Cascade correlation).
 
  In summary:
 
 I am pleased to see that people are trying to overcome some of the
problems encountered in building neural nets. However, I would urge
people not to missappropriate the activities of people in other fields
(neuroscience) and to learn a little more about the real capabilities of
humans and their brains as described by neuroscientists, and
psychologists. I would also ask that more account be taken of some of
the teoretical literature on learning be taken into account.
 
 I hope this contribution is useful"
********************************************************
********************************************************
On May 18 Craig Hicks wrote:
 
" Hi,
 >A. Perform Network Design Task: A neural network/connectionist
 >learning method must be able to design an appropriate network for
 >a given problem, since, in general, it is a task performed by the
 >brain. A pre-designed net should not be provided to the method as
 >part of its external input, since it never is an external input to the
 >brain. From a neuroengineering and neuroscience point of view, this
 >is an essential property for any "stand-alone" learning system - a
>system that is expected to learn "on its own" without any external
 >design assistance.
 
 Doesn't this ignore the role of evolution as a "learning" force?
 It's undisputable that the brain has a highly specialized structure.
 Obviously, this did  not come from nowhere,  but is the result  of the
 forces of natural selection."
********************************************************
********************************************************
On May 23 Dgragan Gamberger wrote:
 
"I read you submission with great interest although (or may because
of) I m not working in the field of neural networks. My interests are in
the field of inductive learning. The presented ideas seem very attractive
 to me and in my opinion your criticism of the present systems is fully
 justified. The only suggestion for improvement is on part C.:
 
 > C.  Quickness in Learning: The method must be quick in its
 > learning and learn rapidly from only a few examples, much as
 > humans do. For example, one which learns from only 10 examples
 > learns faster than one which requires a 100 or a 1000 examples.
 
 Although the statement is not incorrect by itself, in my opinion it
 reflects the common unawareness of the importance of redundancy
for machine, as well as for human learning. In practice neither machine
 nor human can learn something (except extremely simple concepts)
 from 10 examples especially if there is noise (errors in training
examples). Even for learning of simple concepts it is advisable to use
as much as possible training examples (and not only necessary subset)
because it can improve quality and (at least) reliability of induced
concepts. Especially for handling imperfections in training data (noise)
the use of redundant training set is obligatory.
 
 In practice, humans can and do induce concepts from a small training
set but they are 'aware' of their unreliability and use every occasion
 (additional examples) to test induced concepts and to refine them if
necessary. That is potentially the ideal model of incremental learning."
********************************************************
********************************************************
On May 25 Guido Bugmann responded to Raj Rao:
 
"A similar question (are there references for 1 millions neurons lost
 per day ?) came up in a discussion on the topic of robustness
 on connectionists a few years ago (1992). Some of the replies were:
  -------------------------------------------------------
 From Bill Skaggs, bill at nsma.arizona.edu :
 
 There have been a number of studies of neuron loss in aging.
 It proceeds at different rates in different parts of the brain,
 with some parts showing hardly any loss at all. Even in
 different areas of the cortex the rates of loss vary widely,
 but it looks like, overall, about 20% of the neurons are lost
 by age 60.
 
 Using the standard estimate of ten bilion neurons in the
 neocortex, this works out to about one hunderd thousand
 neurons lost per day of adult life.
 
 Reference:
 "Neuron numbers and sizes in aging brain: Comparisons of
human, monkey and rodent data" DG Flood & PD Coleman,
 Neurobiology of Aging, 9, (1988) pp.453-464.
 --------------------------------------------------------
 From  Arshavir Balckwell, arshavir at crl.ucsd.edu :
 
 I have come across a brief reference to adult neural
 death that may be of use, or at  least a starting point.
 The book is:
 
 Dowling, J.E. 1992 Neurons and Networks. Cambridge: Harward
Univ.
 
 In a footnote (!) on page 32, he writes:
 
 There is typically a loss of 5-10 percent of brain tissue with age.
 
 Assuming a brain loss of 7 percent over a life span of 100 years,
 and 10^11 neurons (100 billions) to begin with, approximately
 200,000 neurons are lost per day.
 ----------------------------------------------------------------
 From Jan Vorbrueggen, jan at neuroinformatik.ruhr-uni-bochum.de
 
 As I remember it, the studies showing the marked reduction
 in nerve cell count with age were done around the turn of the
 century. The method, then as now, is to obtain brains of deceased
 persons, fix them, prepare cuts, count cells microscopically
 in those cuts, and then estimate the total number by multiplying
 the sampled cells/(volume of cut) with the total volume.
 This method has some obvious systematic pitfalls, however.
 The study was done again some (5-10?) years ago by a German
 anatomist (from Kiel I think), who tried to get these things
 under better control. It is well known, for instance, that
 tissue shrinks when it is fixed; the cortex's pyramidal cells
 are turned into that form by fixation. The new study showed
 that the total water content of the brain does vary dramatically
with age; when this is taken into account, it turns out that
 the number of cells is identical within error bounds (a few
 percents?) between quite young children and persons up to
 60-70 years of age.
 
 All this is from memory, and I don't have access to the
 original source, unfortunately; but I'm pretty certain that
 the gist is correct. So the conclusion seems to be that the
 cell loss with age in the CNS is much lower than generally
 thought.
 ----------------------------------------------------------------
 From Paul King, Paul_King at next.com
 
 Moshe Abeles in Corticonics (Cambridge Univ. Press, 1991)
 writes on page 208 that:
 
 "Comparisons of neural densities in the brain of people
 who died at different ages (from causes not associated
 with brain damage) indicate that about a third of the
 cortical cell die between the ages of twenty and eighty
 years (Tomlinson and Gibson, 1980). Adults can no longer
 generate new neurons, and therefore those neurons that
 die are never replaced.
  The neuronal fallout proceeds at a roughly steady
 rate throughout adulthood (although it is accelerated when
 the circulation of blood in the brain is impaired). The rate
 of neuronal fallout is not homogeneous throughout all
 the cortical regions, but most of the cortical regions
 are affected by it.  Let us assume that every year about 0.5% of the
 cortical cells die at random...."  and goes on to discuss the
implications for network robustness.
 
 Reference:
 
 Gearald H,  Tomlinson BE and Gibson PH (1980) "Cell counts
 in human cerebral cortex in normal adults throughout life
using an image analysis computer" J. Neurol., 46, pp. 113-136.
 -------------------------------------------------------------
 From Robert A. Santiago, rsantiag at note.nsf.gov
 
 "In search of the Engram"
 
 The problem of robutsness from a neurobiological
 perspective seems to originate from works done by
 Karl Lashley. He sought to find how memory was
 partitioned in the brain. He thought that memories
 were kept on certain neuronal circuit paths (engrams)
 and experimented under this hypothesis by cutting
 out parts of the memory and seeing if it affected
 memory...  Other work was done by a gentlemen named
 Richard F. Thompson. Both speak of the loss of
 neurons in a system and how integrity was kept.
 In particular Karl Lashley spoke of the memory
 as holograms...
 -------------------------------------------------
  Hope it helps..."
******************************************************
******************************************************
On May 23 Daniel Crespin wrote:
 
"Dear Dr. Asim Roy:
 
 Your message on the subject "Connectionist Learning - Some New
 Ideas" is indeed interesting. The papers abstracted below seem
 relevant to the Ideas/Questions you state. In fact, the problems
 of A) Network design, B) Robustness in learning, C) Quickness in
 learning, D) Efficiency in learning, and E) Generalization in
 learning, are solved with the algorithms explained in Y1" and
Y2". Detailed comments, divided into six parts, can be found
 below, after the abstracts.
 
 To obtain  the preprints use a Web browser and the following URL:
 
 http://euler.ciens.ucv.ve/Professors/dcrespin/Pub/
 
                              **ABSTRACTS**
 
 Y1".Neural polyhedra: Explicit formulas to realize any
 polyhedron as a three layer perceptron neural network. Useful to
 calculate directly and without training the architecture and
 weights of a network that executes a given pattern recognition
 task. See preprint below. 8 pages.
 
 Y2".Pattern recognition with untrained perceptrons: Shows how to
 construct polyhedra directly from given pattern recognition
 data. The perceptron network associated to these polyhedra (see
 preprint above) solves the recognition problem. 10 pages.
 
Y3".Neural network formalism: Neural networks are defined using
 only elementary concepts from set theory, without the usual
 connectionistic graphs. The typical neural diagrams are derived
 from these definitions. This approach provides mathematical
 techniques and insight to develop theory and applications of
 neural networks. 8 pages
 
 Y4".Geometry of perceptrons: It is proved that perceptron
 networks are products of characteristic maps of polyhedra. This
 gives insight into the geometric structure of these networks.
 The result also holds for more general (algebraic, etc.)
 perceptron networks, and suggests a new technique to solve
 pattern recognition problems. See other preprints in this
 location. 3 pages.
 
 Y5".Generalized Backpropagation: Global backpropagation
formulas for differentiable neural networks are considered from
the viewpoint of minimization of the quadratic error using the
 gradient method. The gradient of (the quadratic error function
 of) a processing unit is expressed in terms of the output error
and the transposed derivative of the unit with respect to the
 weight. The gradient of the layer is the product of the
 gradients of the processing units. The gradient of the network
 equals the product of the gradients of the layers.
 Backpropagation provides the desired outputs or targets for the
 layers. Standard formulas for semilinear networks are deduced as
 a special case.
 
 **COMMENTS TO THE MESSAGE OF DR. ASIM ROY**
 (This is a comment on Task A: Design Networks.)
 
 ### Comment (1 of 6) to A: One of the basic purposes of neural
 networks is to perform pattern recognition, and this can be
 reduced in most (perhaps all) cases to the following: Given a
 finite set A of examples and a finite set B of counterexamples,
 with A and B non-empty disjoint subsets of n-dimensional space
 R^n, define in an explicit way the characteristic function f of
 a region R of R^n such that A is contained in the interior of R
 and B is disjoint from the closure of R. One says that f
 discerns or recognizes A and B. The differentiation of A and B
 is accomplished by f because f(a)=1 for all a in A and
 f(b)=0 for all b in B. A special case occurs when all elements
 of A and B are binary vectors. Recall from Y2" that if new
 examples are added, say points in a set A', they are expected to
 lie in R. Similarly, additional counterexamples B' are expected
 to lie in the exterior of R. If this is the case the region R is
 'good'. Otherwise one has of 'overfitting', 'underfitting' or
 both, that is, 'unfitting'.
 
 The standpoint taken here is that pattern recognition consists in
 differentiating or discerning two classes of input objects, the
elements of A and B. More complex tasks can in many cases be
 reduced to this.
 
 Note that the function f has the following reductionistic
 effect: The output is 1 for all points of A and 0 for all points
 of B. Therefore from the viewpoint of the output all the
 elements of A are considered identical to each other and
 similarly for elements of B.
 
 On the other hand, A contained in the interior of R means that
 some additional room is left in R,hopefully for the extra points of A'.
 And since B is in the exterior of R, room is in principle
 available, hopefully for points of B'.
 
 It should be clear that both in theory and in practical situations,
 under- and over-fitting is extremely difficult to rule out in
 advance. If a neural network is interpreted as some form of
 'knowledge' then this means that NN knowledge is imperfect,
 being always subject to refinement and/or to refutation.
 Metaphorical continuation of this leads to: Perfect knowledge is
trascendental. It is beyond neural networks and probably beyond
 human brains (minds?) to function in infallible ways, even in the
case of single  issues.
 
 The usual paradoxes then appear. The statement "Knowledge can
 always be refuted" is knowledge in a human brain. If brains are
 considered as some sort of perceptron neural network then the
 knowledge they carry is refutable. But in the present case the
 refutation implies that there exists knowledge that cannot be
 refuted. Thus, the classical epistemological problems reappear
 in the context of neural networks.
 
 It is proved in preprint Y4" above that if the threshold
 functions are discontinuous (Heaviside functions) then
 perceptron neural networks with n real valued inputs
 (equivalently, with  a single input equal to a vector in R^n)
 are products of characteristic maps of polyhedra contained in
 R^n. Therefore, recognition of patterns with the aid of
perceptron networks is included, at least in principle, in the
 general case of specifying a region R, which for perceptrons is
 a polyhedron.
 
 The region R that discerns A and B has two important properties:
 1) Depends on A and B. 2) Is non-unique.  In particular, for
 perceptrons the polyhedron R depends on the data. It is shown in
 Y2" how to construct, given A and B, a suitable polyhedron R.
 This polyhedron can then be realized as a percepron neural
 network with at most three layers, as shown in Y1". Actual
 algorithms to construct the polyhedron and the network are given
 in Y1" and Y2" and these algorithms in fact carry out the
 'design' of a classical perceptron neural network specifically
 suited to the data of the recognition problem.
 
 The non-uniqueness of the polyhedron means that there are in
 principle many different perceptron architectures and choice of
 weights that recognize A and B. One consequence is that several
networks that recognize A and B can perform in disparate ways
with the additional data A' and B'. This can be called "plurality of
 agreement": The networks agree on the 'basic issue' of
recognizing A and B but differ on the more difficult extension
 of the original problem, namely, discerning the larger data sets
 obtained when A' is added to A and B' is added to B. If
 modelling brains with NN's is a valid procedure then plurality
 of agreement could perhaps be related to the variable results of
 education and in to the diversity of behaviour within an
 otherwise uniform group of individuals.
 
 *** END OF COMMENT 1 OF 6 ***
(This is a comment to Task B: Robustness in Learning)
 
 ### Comment (2 of 6) to B: The methods of Y1" and Y2" are robust
 in the sense indicated: they have nothing to do with minimizing
 error functions, backpropagation, local minima, flat regions,
 etc. If the data sets A and B are given then the algorithms
 produce the architecture and weights of a neural network that
 defines a polyhedron R with the required properties. No
 'learning difficulties' appear. However, 'performance
 difficulties' due to unpredictable fitting can always show up.
 
 About the performance expected from artificial neural networks,
 statement B represents a reasonable goal. Unless, of course, a science
 fiction scenario appears, with computers developing
their own interests and dominating or wiping out humans, for the
 moment an unlikely situation. Neural networks do not even seem
 currently powerful enough to merit special attention within the
 general political issues surrounding Computocracy.
 
 The observation and study of bird wings and bird flight was an
 inportant historical step in the development of flying machines.
 Natural bird wings have feathers, airplane wings do not.
 Artificial flight outperforms flying animals at least in some
 aspects like speed and carrying capacity. Feathers in airplane
 wings would probably be a nuisance. Similar comments can be made
 about land transportation with regard to legs of horses and
 weels of cars. The point is that imitating Mother Nature is no
 doubt useful but does not have to be carried to extremes. Note that
once airplanes become available the purpose arose of using them to
drop explosives, chemicals and biological warfare agents. The political
 issues are hard to avoid.
 
Because of possible misinterpretations and ethical concerns the
 following phrase of Dr. Roy requires a separate comment.
 
 "Some people might argue that ordinary brains, and particularly
 those with learning disabilities, do exhibit such problems and
 that these learning requirements are the attributes only of a
 "super" brain. The goal of neuroengineers and neuroscientists is
 to design and build learning systems that are robust, reliable
 and powerful. They have no interest in creating weak and
 problematic learning devices that need constant attention and
 intervention"
 
 After a reference to 'ordinary brains' and to 'those with
 learning disabilities' the term 'super brain' appears and the
 goal is set for neuroengineers and neuroscientists to design
 things 'robust, reliable and powerful' and not 'weak and
 problematic'.  This awakens too many reminiscences of issues
 like eugenics, superior races, nazism, ethnic cleansing and
 the like, not to mention possible offence to persons concerned
  about people with disabilities. I assume Dr. Roy was unaware of
 these implications but nevertheless I request from him a
 clarification of this particular point.
 
 Let me add that much has been learned about language, vision and
 brain mechanisms in general, by studying disabled or handicapped
 persons. Neuroscientists have to be grateful to this group of
 fellow human beings and should reciprocate.
 
 Dinosaurs were in many ways much more powerful than their weak
 contemporaries, the problematic and unreliable mammals. However,
 evolution has not been kind to dinosaurs. Nobody knows about
 forthcoming surprises.
 
 Engineering is often defined as the use of scientific knowledge
 to satisfy social and individual human needs. A remarkably
 successful evolutionary strategy adopted by many species is
 cooperative behaviour. For us human beings, this implies not
 only concern about the well being of other humans and of society
 in general but also solidarity with the less gifted and the
weak. Instead of just abandoning or destrying persons with
 neurological damage (or with any illness) considerable medical
 effort is spent on them. Engineers are involved in creation of
 most diverse prosthesis. The powerful brains of neuroscientists
 and neuroengineers are not alien to the needs of society. They
 should and will, with extreme dedication and in a most
 constructive way, concern themselves with brains considered less
 powerful.
 
 For all these reasons I think that care is needed when refering
 to less fortunate persons, particularly in a context that, even
 if not intended, could be perceived as disrespect or disregard
 for them, or as eulogistic to ideologies that have already
 produced considerable human suffering.
 
 These critical remarks and comments on a non-technical question
 of wording are made respectfully, with a sense of duty and expecting
to settle the matter without raising major issues.
 
 *** END OF COMMENT 2 OF 6 ***
 (This is a comment on Task C: Quickness in Learning.)
 
 # Comment (3 of 5) to C: The methods of Y1" and Y2" are
 certainly very fast. And they can in principle learn from a pair
 of sets with single elements: A={a} and B={b}. Also, since in
 general the polyhedron R depends on A and B, the network retains
 a certain 'memory' of both A and B. Furthermore, the network
 buit around a set of 10 examples and 10 counterexamples (a 10-10
 NN) will be, generally speaking, different from a 1000-1000 NN.
 What is learned (i.e. the resulting NN) depends on the data set and
because of under- and over- fitting comparision of the resulting
networks is not obvious. There does not seem to be an obvious
 criterion to tell when the extra examples are waste.
 
 *** END OF COMMENT 3 OF 6 ***
(This is a comment on Task D: Efficiency in Learning.)
 
 # Comment (4 of 6) to D: The algorithms of Y1" and Y2" are
 extremely fast. If |A|=r (A has r elements) and |B|=t then the
 number of operations is polynomial in rs. From the viewpoint of
 the papers abstracted above the whole subject of complexity of
 the NN's and the algorithms is rather extensive and requires a
 separate paper.
 
 *** END OF COMMENT 4 OF 6 ***
 (This is a comment on Task E: Generalization in Learning.)
 
#Comment (5 of 6) to E: The algorithms can design a rather small
 network. However, to find for given data sets A and B, the
 absolutely smallest possible net within the class of networks
 the algorithm designs looks like a hard problem. An exhaustive
 search seems necessary and this involves permuting the order in
 which rs linear forms are processed, requiring exponential rs
 time. But for a particular given order the algorithm efficiently
 gives the best solution. Other, not yet explored, approaches to
 the best network can be attempted but this is the subject of
further substantial research.
 
 *** END OF COMMENT 5 OF 6 ***
 
 # Comment (6 of 6) to General Comments: It is not clear that "the
 brain itself designs the network for the brain". Learning behaviour
results from the complex interaction of evolutionary, nutritional,
genetical, familiar, social, economical  and other factors, to say the least.
 
 An additional desirable property for any NN program,
 already implied by the comments but not explicitly mentioned by
 Dr. Roy, is the question of proper fitting. This seems insoluble
 or not well posed. SeeY2" above. But with regard to this point
 it is possible to establish some comparisions between the classical
 perceptrons and the NN's built with radial basis functions.
 Percepton processing units are characteristic functions of
 (non-homogeneous) linear half-spaces. They split R^n into two
 symmetric halves, each with infinite volume; the complement
 of a half space is another half space. Since a given half space
 is isometric to its complement, as much is put inside as left
 outside. On the other hand, radial basis functions are characteristic
maps of hyperballs. The inside of a hyperball has finite volume while
 the outside has infinite volume (in R^n). These arguments are
 rather heuristic, but they seem to indicate that underfitting
 problems, overfitting problems, or both, will appear more often
in radial basis NN than in comparable classical perceptron NN.
 Given the nature of the issue, performance tests are necessary
 to decide the relative merits of classical perceptrons vs.
 radial basis.
 
 Motivation for the new NN program proposed in Dr. Roy seems to
 be, at least partially, the biology of the brain. But opposite
 arguments can also be based in Biology. Just to show a few more
 facets of the problem the following can be said. According to
 current scientific belief biological neural systems have been
 subject to hundreds of millions of years (this could be
 polynomial time, but not in a convincing way) of natural
 selection, a most basic evolutionary mechanism. But if current
 evolutionary models are correct this mechanism is some sort of
 local optimization rule. The assumption that Mother Nature has
 some kind of global mechanism or knowledge to foresee the long
 term result of its selective activity is not considered orthodox
 and could even be heretic, for certain purists at least. If it
 is true that the brain uses a global strategy then a paradox
 appears: The locally acting mechanism of evolution has
  produced a globally optimizing brain.
 
 The general techniques to address the issues A), B), C), D) and
 E) are already available in Y1"-Y4". They are a natural consequence
 of Classical Connectionism and Neural Network Theory.
 What should perhaps be necessary is to translate the mathematical
 formulas in which the techniques are expressed into a language more
 familiar to cognitive scientists and computer scientists.
 
 Finally, and as certain deep thinker said time ago, we are just
 children playing with a few pebbles in the seashore. Having such
 a limited knowledge, how can we be sure our accomplishments or
 goals are global optima? In more technical parlance, knowing
 only a finite and bounded part of the infinite domain, and only
 finitely many values of the function, we cannot say that what we
 look at is a global optimum. With regard to geological times, to
 nature, to evolution or to the cosmos, our human optima are
 always local and relative.
 
 I hope these remarks could be of interest to all concerned
persons and particularly to the originator of this discussion,
 Dr. Asim Roy."
********************************************************
********************************************************
**************************************************
***************************************************
APPENDIX
 
We have recently published a set of principles for learning in neural
networks/connectionist models that is different from classical
connectionist learning (Neural Networks, Vol. 8, No. 2; IEEE
Transactions on Neural Networks, to appear; see references
below). Below is a brief summary of the new learning theory and
why we think classical connectionist learning, which is
characterized by pre-defined nets, local learning laws and
memoryless learning (no storing of training examples for learning),
is not brain-like at all. Since vigorous and open debate is very
healthy for a scientific field, we invite comments for and against our
ideas from all sides.
 
 
"A New Theory for Learning in Connectionist Models"
 
We believe that a good rigorous theory for artificial neural
networks/connectionist models should include learning methods
that perform the following tasks or adhere to the following criteria:
 
A. Perform Network Design Task: A neural network/connectionist
learning method must be able to design an appropriate network for
a given problem, since, in general, it is a task performed by the
brain. A pre-designed net should not be provided to the method as
part of its external input, since it never is an external input to the
brain. From a neuroengineering and neuroscience point of view, this
is an essential property for any "stand-alone" learning system - a
system that is expected to learn "on its own" without any external
design assistance.
 
B. 	Robustness in Learning: The method must be robust so as
not to have the local minima problem, the problems of oscillation
and catastrophic forgetting, the problem of recall or lost memories
and similar learning difficulties. Some people might argue that
ordinary brains, and particularly  those with learning disabilities, do
exhibit such problems and that these learning requirements are the
attributes only of a "super" brain. The goal of neuroengineers and
neuroscientists is to design and build learning systems that are
robust, reliable and powerful. They have no interest in creating
weak and problematic learning devices that need constant attention
and intervention.
 
C. 	Quickness in Learning: The method must be quick in its
learning and learn rapidly from only a few examples, much as
humans do. For example, one which learns from only 10 examples
learns faster than one which requires a 100 or a 1000 examples. We
have shown that on-line learning (see references below),  when not
allowed to store training examples in memory, can be extremely
slow in learning - that is, would require many more examples to
learn a given task compared to methods that use memory to
remember training examples. It is not desirable that a neural
network/connectionist learning system be similar in characteristics
to learners characterized by such sayings as "Told him a million
times and he still doesn't understand." On-line learning systems
must learn rapidly from only a few examples.
 
D. 	Efficiency in Learning: The method must be
computationally efficient in its learning when provided with a finite
number of training examples (Minsky and Papert[1988]). It must be
able to both design and train an appropriate net in polynomial time.
That is, given P examples, the learning time (i.e. both design and
training time) should be a polynomial function of P. This, again, is a
critical computational property from a neuroengineering and
neuroscience point of view.  This property has its origins in the
belief that  biological systems (insects, birds for example) could not
be solving NP-hard problems, especially when efficient, polynomial
time learning methods can conceivably be designed and developed.
 
E. 	Generalization in Learning: The method must be able to
generalize reasonably well so that only a small amount of network
resources is used. That is, it must try to design the smallest possible
net, although it might not be able to do so every time. This must be
an explicit part of the algorithm. This property is based on the
notion that the brain could not be wasteful of its limited resources,
so it must be trying to design the smallest possible net for every
task.
 
 
General Comments
 
This theory defines algorithmic characteristics that are obviously
much more brain-like than those of classical connectionist theory,
which is characterized by pre-defined nets, local learning laws and
memoryless learning (no storing of actual training examples for
learning). Judging by the above characteristics, classical
connectionist learning is not very powerful or robust. First of all, it
does not even address the issue of network design, a task that
should be central to any neural network/connectionist learning
theory. It is also plagued by efficiency (lack of polynomial time
complexity, need for excessive number of teaching examples) and
robustness problems (local minima, oscillation, catastrophic
forgetting, lost memories), problems that are partly acquired from
its attempt to learn without using memory. Classical connectionist
learning, therefore, is not very brain-like at all.
 
As far as I know, there is no biological evidence for any of the
premises of classical connectionist learning. Without having to
reach into biology, simple common sense arguments can show that
the ideas of local learning, memoryless learning and predefined nets
are impractical even for the brain! For example, the idea of local
learning requires a predefined network. Classical connectionist
learning forgot to ask a very fundamental question - who designs
the net for the brain? The answer is very simple: Who else, but the
brain itself! So, who should construct the net for a neural net
algorithm? The answer again is very simple: Who else, but the
algorithm itself! (By the way, this is not a criticism of constructive
algorithms that do design nets.) Under classical connectionist
learning, a net has to be constructed (by someone, somehow - but
not by the algorithm!) prior to having seen a single training
example! I cannot imagine any system, biological or otherwise,
being able to construct a net with zero information about the
problem to be solved and with no knowledge of the complexity of
the problem. (Again, this is not a criticism of constructive
algorithms.)
 
A good test for a so-called "brain-like" algorithm is to imagine it
actually being part of a human brain. Then examine the learning
phenomenon of the algorithm and compare it with that of the
human's. For example, pose the following question: If an algorithm
like back propagation is "planted" in the brain, how will it behave?
Will it be similar to human behavior in every way? Look at the
following simple "model/algorithm" phenomenon when the back-
propagation algorithm is "fitted" to a human brain. You give it a
few learning examples for a simple problem and after a while this
"back prop fitted" brain says: "I am stuck in a local minimum. I
need to relearn this problem. Start over again." And you ask:
"Which examples should I go over again?" And this "back prop
fitted" brain replies: "You need to go over all of them. I don't
remember anything you told me." So you go over the teaching
examples again. And let's say it gets stuck in a local minimum again
and, as usual, does not remember any of the past examples. So you
provide the teaching examples again and this process is repeated a
few times until it learns properly. The obvious questions are as
follows: Is "not remembering" any of the learning examples a brain-
like phenomenon? Are the interactions with this so-called "brain-
like" algorithm similar to what one would actually encounter with a
human in a similar situation? If the interactions are not similar, then
the algorithm is not brain-like. A so-called brain-like algorithm's
interactions with the external world/teacher cannot be different
from that of the human.
 
In the context of this example, it should be noted that
storing/remembering relevant facts and examples is very much a
natural part of the human learning process. Without the ability to
store and recall facts/information and discuss, compare and argue
about them, our ability to learn would be in serious jeopardy.
Information storage facilitates mental comparison of facts and
information and is an integral part of rapid and efficient learning. It
is not biologically justified when "brain-like" algorithms disallow
usage of memory to store relevant information.
 
Another typical phenomenon of classical connectionist learning is
the "external tweaking" of algorithms. How many times do we
"externally tweak" the brain (e.g. adjust the net, try a different
parameter setting) for it to learn? Interactions with a brain-like
algorithm has to be brain-like indeed in all respect.
 
The learning scheme postulated above does not specify how
learning is to take place - that is, whether memory is to be used  or
not to store training examples for learning, or whether learning is to
be through local learning at each node in the net or through some
global mechanism. It merely defines broad computational
characteristics and tasks (i.e. fundamental learning principles) that
are brain-like and that all neural network/connectionist algorithms
should follow. But there is complete freedom otherwise in
designing the algorithms themselves. We have shown that robust,
reliable learning algorithms can indeed be developed that satisfy
these learning principles (see references below). Many constructive
algorithms satisfy many of the learning principles defined above.
They can, perhaps, be modified to satisfy all of the learning
principles.
 
The learning theory above defines computational and learning
characteristics that have always been desired by the neural
network/connectionist field. It is difficult to argue that these
characteristics are not "desirable," especially for self-learning, self-
contained systems.  For neuroscientists and neuroengineers, it
should open the door to development of brain-like systems they
have always wanted - those that can learn on their own without any
external intervention or assistance, much like the brain. It essentially
tries to redefine the nature of algorithms considered to be brain-
like. And it defines the foundations for developing truly self-
learning systems - ones that wouldn't require constant intervention
and tweaking by external agents (human experts) for it to learn.
 
It is perhaps time to reexamine the foundations of the neural
network/connectionist field. This mailing list/newsletter provides an
excellent opportunity for participation by all concerned throughout
the world. I am looking forward to a lively debate on these matters.
That is how a scientific field makes real progress.
 
 
Asim Roy
Arizona State University
Tempe, Arizona 85287-3606, USA
Email: ataxr at asuvm.inre.asu.edu
 
 
References
 
1.  Roy, A., Govil, S. & Miranda, R. 1995. A Neural Network
Learning Theory and a Polynomial Time RBF Algorithm. IEEE
Transactions on Neural Networks, to appear.
 
2.  Roy, A., Govil, S. & Miranda, R. 1995. An Algorithm to
Generate Radial Basis Function (RBF)-like Nets for Classification
Problems. Neural Networks, Vol. 8, No. 2, pp. 179-202.
 
3.  Roy, A., Kim, L.S. & Mukhopadhyay, S. 1993. A Polynomial
Time Algorithm for the Construction and Training of a Class of
Multilayer Perceptrons. Neural Networks, Vol. 6, No. 4, pp. 535-
545.
 
4.  Mukhopadhyay, S., Roy, A., Kim, L.S. & Govil, S. 1993. A
Polynomial Time Algorithm for Generating Neural Networks for
Pattern Classification - its Stability Properties and Some Test
Results. Neural Computation, Vol. 5, No. 2, pp. 225-238.



More information about the Neur-sci mailing list