Technical Report Series in Neural and Computational Learning

John Shawe-Taylor john at
Fri Oct 4 04:08:41 EST 1996

The European Community ESPRIT Working Group in Neural and Computational 
Learning Theory (NeuroCOLT) has produced a set of new Technical Reports
available from the remote ftp site described below. They cover topics in
real valued complexity theory, computational learning theory, and analysis
of the computational power of continuous neural networks.  Abstracts are
included for the titles.

NeuroCOLT Technical Report NC-TR-96-049:
Extended Grzegorczyk Hierarchy in the BSS Model of Computability
by  Jean-Sylvestre Gakwaya, Universit\'e de Mons-Hainaut, Belgium

In this paper, we give an extension of the Grzegorczyk Hierarchy to the
BSS theory of computability which is a generalization of the classical
theory. We adapt some classical results related to the Grzegorczyk
hierarchy in  the new setting.

NeuroCOLT Technical Report NC-TR-96-050:
Learning from Examples and Side Information
by  Joel Ratsaby, Technion, Israel
    Vitaly Maiorov, Technion, Israel

We set up a theoretical framework for learning from examples and side
information which enables us to compute the tradeoff between the sample
complexity and information complexity for learning a target function in
a Sobolev functional class $\cal F$.  We use the notion of the {\em
$n^{th}$ minimal radius of information} of Traub et. al. \cite{traub}
and combine it with VC-theory to define a new quantity $I_{n,d}({\cal
F})$ which measures the minimal approximation error of a target $g\in
{\cal F}$ by the family of function classes with pseudo-dimension $d$
under a  given side information which consists of any $n$ measurements
on the target function $g$ constrained to being linear operators.  By
obtaining  almost tight upper and lower bounds on $I_{n,d}({\cal F})$
we find an information operator $\hat{N}_n$ which yields a worst-case
error no larger than a logarithmic factor in $n$ and $d$ than the lower
bound on $I_{n,d}({\cal F})$.  Hence to within a logarithmic factor it
is the most efficient way of providing side information about a target
$g$ under the constraint that the information operator must be linear
and that the approximating class has pseudo-dimension $d$.

NeuroCOLT Technical Report NC-TR-96-051:
Complexity and Dimension
by  Felipe Cucker, City University of Hong Kong
    Pascal Koiran, Ecole Normale Superieure, Lyon, France
    Martin Matamala, Universidad de Chile, Chile

In this note we define a notion of sparseness for subsets of $\Ri$ 
and we prove that there are no sparse $\NPadd$-hard sets. Here we deal
with additive machines which branch on equality tests of the form $x=y$
and $\NPadd$ denotes the corresponding class of sets decidable in
nondeterministic polynomial time. Note that this result implies the
separation $\Padd\not=\NPadd$ already known.

NeuroCOLT Technical Report NC-TR-96-052:
Semi-Algebraic Complexity -- Additive Complexity of Diagonalization of
by  Thomas Lickteig, Universit\"at Bonn, Germany
    Klaus Meer, RWTH Aachen, Germany

Abstract (for references see full paper):
We study matrix calculation such as diagonalization of quadratic forms
under the aspect of additive complexity and relate these complexities
to the complexity of matrix multiplication.  While in \cite{BKL} for
multiplicative complexity the customary ``thick path existence''
argument was sufficient, here for additive complexity we need the more
delicate finess of the real spectrum (cf. \cite{BCR}, \cite{Be},
\cite{KS}) to obtain a complexity relativization.  After its
outstanding success in semi-algebraic geometry the power of the real
spectrum method in complexity theory becomes more and more apparent.
Our discussions substantiate once more the signification and future
r\^ole of this concept in the mathematical evolution of the field of
real algebraic algorithmic complexity.
A further technical tool concerning additive complexity is the
structural transport metamorphosis from \cite{Li1} which constitutes
another use of exponentiation and logarithm as it appears in the work
on additive complexity by \cite{Gr} and \cite{Ri} through the use of
We confine ourselves here to diagonalization of quadratic forms.  In
the forthcoming paper \cite{LM} further such relativizations of
additive complexity will be given for a series of matrix computational

NeuroCOLT Technical Report NC-TR-96-053:
Structural Risk Minimization over Data-Dependent Hierarchies
by  John Shawe-Taylor, Royal Holloway, University of London, UK
    Peter Bartlett, Australian National University, Australia
    Robert Williamson, Australian National University, Australia
    Martin Anthony, London School of Economics, UK

The paper introduces some generalizations of Vapnik's method of
structural risk minimisation (SRM). As well as making explicit some of
the details on SRM, it provides a result that allows one to trade off
errors on the training sample against improved generalization
performance.  It then considers the more general case when the
hierarchy of classes is chosen in response to the data. A result is
presented on the generalization performance of classifiers with a
``large margin''.  This theoretically explains the impressive
generalization performance of the maximal margin hyperplane algorithm
of Vapnik and co-workers (which is the basis for their support vector
machines).  The paper concludes with a more general result in terms of
``luckiness'' functions, which provides a quite general way for
exploiting serendipitous simplicity in observed data to obtain better
prediction accuracy from small training sets.  Four examples are given
of such functions, including the VC dimension measured on the sample.

NeuroCOLT Technical Report NC-TR-96-054:
Confidence Estimates of Classification Accuracy on New Examples
by  John Shawe-Taylor, Royal Holloway, University of London, UK

Following recent results (NeuroCOLT Technical Report NC-TR-96-053)
showing the importance of the fat shattering dimension in explaining
the beneficial effect of a large margin on generalization performance,
the current paper investigates how the margin on a test example can be
used to give greater certainty of correct classification in the
distribution independent model.  The results show that even if the
classifier does not classify all of the training examples correctly,
the fact that a new example has a larger margin than that on the
misclassified examples, can be used to give very good estimates for the
generalization performance in terms of the fat shattering dimension
measured at a scale proportional to the excess margin. The estimate
relies on a sufficiently large number of the correctly classified
training examples having a margin roughly equal to that used to
estimate generalization, indicating that the corresponding output
values need to be `well sampled'.

- --------------------------------------------------------------------

***************** ACCESS INSTRUCTIONS ******************

The Report NC-TR-96-001 can be accessed and printed as follows 

% ftp  (
Name: anonymous
password: your full email address
ftp> cd pub/neurocolt/tech_reports
ftp> binary
ftp> get
ftp> bye
% zcat | lpr -l

Similarly for the other technical reports.

Uncompressed versions of the postscript files have also been
left for anyone not having an uncompress facility. 

In some cases there are two files available, for example,
The first contains the title page while the second contains the body 
of the report. The single command,
ftp> mget nc-tr-96-002*
will prompt you for the files you require.

A full list of the currently available Technical Reports in the 
Series is held in a file `abstracts' in the same directory.

The files may also be accessed via WWW starting from the NeuroCOLT 

or directly to the archive:

Best wishes
John Shawe-Taylor

More information about the Neur-sci mailing list