Speech and Chaos--Summary

Harry Erwin erwin at trwacs.fp.trw.com
Thu Feb 4 08:52:34 EST 1993


Summary and continuation:

>Path: trwacs.fp.trw.com!trwacs!erwin
>From: erwin at trwacs.fp.trw.com (Harry Erwin)
>Subject: Speech and Chaos
>Message-ID: <erwin.728417704 at trwacs>
>Organization: TRW Systems Division, Fairfax VA
>Date: Sat, 30 Jan 1993 18:15:04 GMT


I'm exploring the possibility that speech may be a Pecora-Carroll process
that synchronizes two perturbed quasi-periodic processes, one serving to
select the utterance (at the deep grammar level) and the other serving to
track the utterance.

A Pecora-Carroll process is used to synchronize chaotic processes. The
dynamics of the chaotic process are decomposed into two components, one
transmitted between the two chaotic processes and the other duplicated at
the individual chaotic processes. It is necessary and sufficient for
synchronization that the lyapunov exponents for the variables in the 
local components be negative. (This means that the variables in the 
transmitted component must contain all periodic and chaotic variables
in the dynamics.) The originators of this concept are Lou Pecora and Tom
Carroll at NRL. Their papers have appeared in a number of IEEE journals.

Steve Barry suggests some questions:
1. Has there been anything yet done localizing deep grammar processing,
and is the region the same for speech and hearing?
2. In studies of shadow speech, what are the limits to the time delay
between hearing the utterance and shadowing it? How about a paraphrase (or
simultaneous translation) of the utterance?

If speech is a P-C process, the driven process in the listener should have
negative conditional lyapunov exponents. This could be explored using
methods similar to those of Walt Freeman.

It would be interesting to put together a dynamic, multi-level model of
speech and hearing to see if a test model compatible with a quasi-periodic
implementation is feasible.

Cheers,
-- 
Harry Erwin
Internet: erwin at trwacs.fp.trw.com


Tom Holroyd's Response:

>From: tomh at BAMBI.CCS.FAU.EDU (Tom Holroyd)
>Newsgroups: bionet.neuroscience
>Subject: Re:  Speech and Chaos
>Message-ID: <9302011527.AA15403 at bambi.ccs.fau.edu>
>Date: 1 Feb 93 15:27:50 GMT
>Sender: daemon at net.bio.net
>Distribution: bionet
>Lines: 33

You should try modelling delayed auditory feedback (DAF).  The typical
experiment with a human is for the human to speak into a mike, and the
speech is delayed 200 to 250 ms or so, and played back into headphones
the speaker is wearing.  It is a very robust phenomenon that the speaker's
speech is disrupted.  There is an optimal time (around 200-250 ms) where
it is almost impossible to speak - the auditory feedback interferes with
the production of speech in a significant way.  Speech *can* proceed
normally with no auditory feedback, much like a deafferented arm can
move to the correct target (i.e. only feedforward processing).  But
normally there is feedback with an appropriate (short) delay.  DAF
experiments show that this feedback plays a role in speech production.
When there is no auditory feedback, the speaker undoubtedly uses an
anticipatory model to predict the feedback and does just fine.  But
the real DAF screws the production process up.

This is not so deep as grammar, but one can certainly view speech
communication as: the speaker's speech signal acts as a perturbation
on the listener's dynamical system.  This view is supported in part by
the relative lack of information in the speech signal itself.  Speech
can be digitized down to 1 bit at 16 KHz and still be recognized and
understood.  Noise can be added etc.  Much of the "information" transmitted
is already in the listener's head, and is simply activated by the
speech signal.

So the question would be, how does a P-C process react when the output
is coupled back to the input with a delay?

Tom Holroyd
Center for Complex Systems and Brain Sciences
Florida Atlantic University, Boca Raton, FL 33431 USA
tomh at bambi.ccs.fau.edu


William Calvin's Response:

>From: wcalvin at stein.u.washington.edu (William Calvin)
>Newsgroups: sci.cognitive,bionet.neuroscience,comp.ai.philosophy
>Subject: Re: Speech and Chaos
>Date: 30 Jan 1993 23:41:15 GMT
>Organization: University of Washington
>Lines: 58
>Message-ID: <1kf3mrINNlko at shelley.u.washington.edu>
>References: <erwin.728417704 at trwacs>
>NNTP-Posting-Host: stein.u.washington.edu

erwin at trwacs.fp.trw.com (Harry Erwin) writes:


>I'm exploring the possibility that speech may be a Pecora-Carroll process
>that synchronizes two perturbed quasi-periodic processes, one serving to
>select the utterance (at the deep grammar level) and the other serving to
>track the utterance.

>A Pecora-Carroll process is used to synchronize chaotic processes. The
>dynamics of the chaotic process are decomposed into two components, one
>transmitted between the two chaotic processes and the other duplicated at
>the individual chaotic processes. It is necessary and sufficient for
>synchronization that the lyapunov exponents for the variables in the 
>local components be negative. (This means that the variables in the 
>transmitted component must contain all periodic and chaotic variables
>in the dynamics.) The originators of this concept are Lou Pecora and Tom
>Carroll at NRL. Their papers have appeared in a number of IEEE journals.

>Steve Barry suggests some questions:
>1. Has there been anything yet done localizing deep grammar processing,
>and is the region the same for speech and hearing?
>2. In studies of shadow speech, what are the limits to the time delay
>between hearing the utterance and shadowing it? How about a paraphrase (or
>simultaneous translation) of the utterance?

>If speech is a P-C process, the driven process in the listener should have
>negative conditional lyapunov exponents. This could be explored using
>methods similar to those of Walt Freeman.

>It would be interesting to put together a dynamic, multi-level model of
>speech and hearing to see if a test model compatible with a quasi-periodic
>implementation is feasible.

>Cheers,
>-- 
>Harry Erwin
>Internet: erwin at trwacs.fp.trw.com

Good set of questions.  Nothing localizes "deep grammar" very well, but we
can say something about neural sequencing specializations that seem common
to both listening and production of speech; see the article in Behavioral
and Brain Science 1983 by George Ojemann on using electrical stimulation
of the cortex of awake epileptics during surgery, testing language and
related functions.  George and I are writing a new popular book to replace
INSIDE THE BRAIN (1980, now out of print) but it won't be out until 1994;
my book THE CEREBRAL SYMPHONY has some of this in it.
	On the more detailed level of coupling chaotic processes that
might correspond to some of the bits and pieces
 (phonemes and words and phrases), I've been thinking some about
sequencing strange attractors.
	Re shadowing speech, don't know the latencies.  But there is some
fascinating folklore on simultaneous translators, e.g., one school in
Belgium that teaches its students to knit while translating, thus
producing a background rhythmic carrier?
    William H. Calvin   WCalvin at U.Washington.edu
    University of Washington  NJ-15
    Seattle, Washington 98195 FAX:1-206-720-1989


My response:

>Path: trwacs.fp.trw.com!trwacs!erwin
>From: erwin at trwacs.fp.trw.com (Harry Erwin)
>Subject: Re: Speech and Chaos
>Message-ID: <erwin.728657562 at trwacs>
>Organization: TRW Systems Division, Fairfax VA
>References: <9302011527.AA15403 at bambi.ccs.fau.edu>
>Distribution: bionet
>Date: Tue, 2 Feb 1993 12:52:42 GMT

You suggest that I should look at delayed auditory feedback (DAF) as a
phenomenum to be modelled in my chaotic model of speech. Short-term DAF
(which is used in speech production) probably corresponds to the sort of
short-term feedback that the cerebellum uses to fine-tune motor actions.
You point out that longer-term DAF (200-250 msec) interferes with speech
production, disrupting it badly. This raises two questions: what is the
nature of the disruption, and what is its onset? I suspect there's an echo
threshold involved. A similar process has been seen in visual attention.

Some of my questions will have to wait on my working out a detailed model.

Your evidence that speech can be understood when digitized at one bit/16
KHz is interesting. How long does each bit transmission last? One reason
I'm thinking in this direction is that music harmony has many of the same
characteristics as speech, without the concern about transmitting large
amounts of information. Hence it may be a better model to study than
speech itself.

Cheers,
-- 
Harry Erwin
Internet: erwin at trwacs.fp.trw.com

Steve Barry has some additional points:

1. What characterizes the onset of speech disruption by DAF?
2. What happens when you do a spectrum shift on the sound before
   you


More information about the Neur-sci mailing list