Follow-up: AltiVec and PowerMac G4s

Bernard de la Cruz delacruz at
Wed Sep 15 18:09:51 EST 1999

This is in response to the thread on AltiVec and the Apple PowerMac G4s.

I don't claim to be an expert but I asked some folks involved with
development of AltiVec/Velocity Engine support about the issue.

To summarize:  First, you'll have to use a non-gcc compiler. Apparently
gcc group does not intend to support AltiVec; however Apple has
for capable compilers at their web site.  Second, given the right
you don't even need to rewrite the code--it will automatically take
functions and chunk them for passing to AltiVec.  Third, , as a parallel
processor, AltiVec can handle FP of at least 32-bit precision.  Given
128-bit wide
paths, you should be able to pass 4 calculations to AltiVec at the same

I've included some of the response below with permission from its
I hope it is helpful.

Bernard de la Cruz

Bernard J. de la Cruz                      <delacruz at>
Biology Ph.D. Candidate                    LAB: 858/534-3107
UC San Diego                               FAX: 858/534-0053
Bonner 3205, 9500 Gilman Drive, MC 0322 La Jolla, California 92093-0322

---begin excerpted file---

> but i'm confused about the meaning of the "vector" calculations.

  Vector operations group similar data and perform the same operation
on all of them in parallel.  For instance, if you had byte arrays, A
and B, and you wanted to add each byte in A to its corresponding byte
in B and store them in C, then you would traditionally do this:

byte A[512];
byte B[512];
byte C[512];
int  i;
for ( i = 0; i < 512; i++ ) C[i] = A[i] + B[i];

This code fragment would load a byte value into a register from the
memory allocated to A, load a byte value into another register from
the memory allocated to B, add them together, and store the result
into the memory allocated to C, 512 times.  That's 1536 accesses to
memory and 512 adds, total.

If you have 128-bit vector registers, then you can use one load
operation to read 16 bytes at a time into a register, one addition
operation to add two vector registers' worth of bytes together,
and one store operation to write them all back into memory as one
contiguous 16-byte chunk.  Hence:

vector byte A[32]; /* 32 vectors x 16 bytes/vector = 512 bytes */
vector byte B[32];
vector byte C[32];
int i;
for ( i = 0; i < 32; i++ ) C[i] = A[i] + B[i];

This code does the exact same amount of useful work as the previous
code fragment, except that it only had to use 96 memory accesses and
32 additions to do it.

(Note it's not quite as cut-n-dried as this .. more of the first code
fragment's memory access will actually be cache accesses, and since the
early G4's are on motherboards with 64-bit main memory busses "one"
load will actually take two bus cycles, etc, but you still see lots of
gain from using vectors.)

<stuff deleted>

There are two ways a compiler can support vector architectures --
implicitly and explicitly.  Implicit support involves analyzing
code and trying to automatically convert it into vector code (eg, from
first code fragment in this post into the other).  This has the
of making the reams and reams of old code we have lying around suddenly
altivec-enabled with a simple recompile.

<stuff deleted>

Explicit support is easier for the compiler, but harder for the
It involves adding keywords to the compiled language so that programmers

can write their code to use vector data types, as described in the
Technology Programming Interface Manual and illustrated in my second
fragment.  This can give much better performing-code than implicit
code, since the human coder has a better idea of what's "safe" and what
isn't than does the compiler, but it involves more work (training and
plication of intelligence) on the programmer's part, makes the
of old code to altivec-utilizing code arduous, limits your ability to
support the code (ie, a programmer who is not trained in using the
language extensions cannot maintain source code that uses it, and
quality assurance tools which need to parse the sources will choke and
on the extensions, making them unusable until the company selling the
can be coaxed into rewriting the tool to support the extension), and
the possibility of having multiple, mutually-incompatible
of compilers which extend the language, damaging portability of code and

limiting one's options for migrating between development environments.

There's another way an application can support altivec, and that's by
using altivec-enabled libraries.  Libraries are collections of functions

which are used in a variety of tasks.  For instance, there is a library
function for calculating the length of a string, or for converting a
tiff-formatted image into a jpeg-formatted image.  The programmers just
call functions from these libraries so they do not have to write them
themselves.  The libraries are then "linked" into their code either when

the application is compiled from source code into an executable binary
(static linking) or when the application is loaded from disk on the end-

user's computer and executed (dynamic linking).  In either case, if a
library gets replaced with another library that "looks" the same (ie,
function names, same arguments, etc) but uses completely different code
get its task done, then the application will happily use the new library

without any changes having to be made to the non-library part of the

---end excerpted file---

More information about the Mol-evol mailing list