self-modifying code [(Re: TOP 10 (names in comp.sci)]

Jeffrey S. Dutky dutky at bellatlantic.net
Sun Aug 9 13:02:15 EST 1998


Andrea Chen wrote:
> I also think it possible to build a machine with literally tens
> of thousands of associative members in which this high level
> structure can contain much of the problem.

The problem with building very large associative memories is that
most of your silicon goes into building comparitors. Essentially,
you are building one huge parallel processor that only has one
instruction (compare A to all registers, return register that
matches). This is VERY watefull of silicon and gets pretty slow
as the number of "registers" to compare to gets large. This is
why you almost never see caches with anything more that 4-way
associativity, the higher associativity is EXPENSIVE (both in
money and in speed) and doesn't get you very much in terms of
improving cache hit rate. Similarly, in order to build a giant
associative memory you would need to build a device at least as
complex as a modern computer, but you wouldn't have nearly the
expressive capability.

> People buy multiprocessors solely to run databases. They don't
> do graphics, they don't do much numerical calculation (and if so
> decimal is more practical since it doesn't require 2 conversions
> and can be done relatively quickly with tables.)

That's odd, I've worked at several different companies that were
using multi-processors to do exactly these things. While it may
be true that a large number of the MP machines being sold are
used for database servers, I don't think it is accurate to say
that that is the ONLY use for such machines. Especially not when
it is fairly clear that both numerical codes and graphical codes
get big wins from MP.

> My belief is that computers will (in the long run) evolve to
> increasingly resemble the human brain with many specialized
> parts each assigned to various tasks.  There is already some
> development along this line with graphic cards and for a lot
> of data processing problems disk logic is more crucial than CPU.

This is purely a technological artifact and cannot be expected
to hold true over the long run. We have, just in the past two
decades, seen shifts toward and away from the current technology
favoring specialized coprocessors, just as we have seen shifts
in the technology favoring bandwidth in different system elements
(CPU, memory, storage).

> [database processing] is a very important subset of computing
> (the software alone is a 10 billion + plus industry).  If it's
> possible to design an architecture which speeds up this small
> set of operations by an order of magnitude or more; then it
> (seems to me) that it doesn't matter if it's well balanced. A
> lot of companies might be better off going for lucrative niches
> rather than competing with well balanced architectures.

This would only be true if the resulting architecture could be
had for a reasonable cost compared to the alternatives, and to
the increase in value that the enhanced architecture provides
(most companies have other concerns besides just data processing,
so data a large increase in data processing performace might
only translate to a small increase in profits) Large associative
memories tend to be very expensive compared to a fast processor
and a bunch of normal memory.

> > The Word Count utility I wrote in asm about 8 years ago (486-
> > optimized) could handle a user-specified set of word and line
> > separator characters, and it still ran faster than the disk
> > transfer speed from a RAM-disk, i.e. more than 10MB/s
> 
> This is interesting.  I'm not a hardware expert, but I thought
> 8 years ago the memory cycle was something like 70 nanoseconds.
> Assuming (and I believe it an incorrect assumption especially
> with RAM disk) that you could get a 4 byte word up every cycle
> this gives me something in the range 6  megabytes per second.

Check your math: At 70ns per access you can get 14,257,714 
accesses per second, at 4 bytes per access yields about 57MB/s.
Actually, the 70ns figure refers to the full cycle time to get
a random word from memory. In systems with caches (which the
486-33 is probably NOT) you would be making page mode accesses
where you would get 4 words from memory in something like 130ns
(70ns for the first word, 20ns for each of the remain words) so
your peak bandwidth would be something like 123MB/s, with the
average falling somewhere in between.

> I haven't studied 486 architecture but unless it has something
> like I described the operations to examine a dozen or so
> delimiters would take something like a dozen CPU cycles.  Not
> to mention the costs of noting the results and putting them
> someplace useful (eg. back in memory.)
> 

The 486 had a pretty low IPC rating, so you are probably off on
that cycles figure by a factor of 5 or so. Of course this probably
helps your point. The actual numbers should look something like
2 instructions for the delimiter lookup (we just use a 256 entry
lookup table filled with boolean values to determine if a char is
a delimiter) 1 instr. for the conditional branch, 1 instr. for
the recording (only needed about 1/5 of the time in english text)
1 instr. for getting the next char. and 2 instr. for the loop.
This gives us 5.2 instructions per iteration, which would have
been something like 25 cycles on a 486. On a modern processor
it would be more like 6 or 7 cycles. If you unroll the loop you
can get the number of instructions down much closer to 3.

> > 
> > The only real bottleneck on problems like these is the need
> > for a (small) lookup table to do text classification.

> Which would be an argument for at least some programmer
> controlled associative memory.  

Replacing a simple lookup table (which executes in O(1) for all
data) with an expensive associative memory is pure lunacy. The
associative memory couldn't provide the result any faster than
the lookup table (1 memory access) and it would cost much more
than a much larger amount of normal memory. For the classes of
problems that CAN'T be solved by the lookup table, you would
not be able to afford enough associative memory to contain the
problem set anyway (you can't afford enough normal memroy for
such problems, at the moment). It is MUCH better to spend your
time and effort (not to mention money) on fast storage systems
and good data structures than on some exotic special purpose
doodad. A b-tree and a caching RAID controller will be much less
expensive, and give nearly as much of a performace boost, as wads
of associative memory.

- Jeff Dutky



More information about the Neur-sci mailing list