Follow-up: AltiVec and PowerMac G4s

Bernard de la Cruz delacruz at biomail.ucsd.edu
Wed Sep 15 18:09:51 EST 1999


This is in response to the thread on AltiVec and the Apple PowerMac G4s.

I don't claim to be an expert but I asked some folks involved with
development of AltiVec/Velocity Engine support about the issue.

To summarize:  First, you'll have to use a non-gcc compiler. Apparently
the
gcc group does not intend to support AltiVec; however Apple has
resources
for capable compilers at their web site.  Second, given the right
compiler,
you don't even need to rewrite the code--it will automatically take
appropriate
functions and chunk them for passing to AltiVec.  Third, , as a parallel
vector
processor, AltiVec can handle FP of at least 32-bit precision.  Given
128-bit wide
paths, you should be able to pass 4 calculations to AltiVec at the same
time.

I've included some of the response below with permission from its
author.
I hope it is helpful.

Bernard de la Cruz
______________________________________________________________________________

Bernard J. de la Cruz                      <delacruz at biomail.ucsd.edu>
Biology Ph.D. Candidate                    LAB: 858/534-3107
UC San Diego                               FAX: 858/534-0053
Bonner 3205, 9500 Gilman Drive, MC 0322 La Jolla, California 92093-0322

---begin excerpted file---

> but i'm confused about the meaning of the "vector" calculations.

  Vector operations group similar data and perform the same operation
on all of them in parallel.  For instance, if you had byte arrays, A
and B, and you wanted to add each byte in A to its corresponding byte
in B and store them in C, then you would traditionally do this:

byte A[512];
byte B[512];
byte C[512];
int  i;
for ( i = 0; i < 512; i++ ) C[i] = A[i] + B[i];

This code fragment would load a byte value into a register from the
memory allocated to A, load a byte value into another register from
the memory allocated to B, add them together, and store the result
into the memory allocated to C, 512 times.  That's 1536 accesses to
memory and 512 adds, total.

If you have 128-bit vector registers, then you can use one load
operation to read 16 bytes at a time into a register, one addition
operation to add two vector registers' worth of bytes together,
and one store operation to write them all back into memory as one
contiguous 16-byte chunk.  Hence:

vector byte A[32]; /* 32 vectors x 16 bytes/vector = 512 bytes */
vector byte B[32];
vector byte C[32];
int i;
for ( i = 0; i < 32; i++ ) C[i] = A[i] + B[i];

This code does the exact same amount of useful work as the previous
code fragment, except that it only had to use 96 memory accesses and
32 additions to do it.

(Note it's not quite as cut-n-dried as this .. more of the first code
fragment's memory access will actually be cache accesses, and since the
early G4's are on motherboards with 64-bit main memory busses "one"
vector
load will actually take two bus cycles, etc, but you still see lots of
gain from using vectors.)

<stuff deleted>

There are two ways a compiler can support vector architectures --
implicitly and explicitly.  Implicit support involves analyzing
non-vector
code and trying to automatically convert it into vector code (eg, from
the
first code fragment in this post into the other).  This has the
advantage
of making the reams and reams of old code we have lying around suddenly
altivec-enabled with a simple recompile.


<stuff deleted>

Explicit support is easier for the compiler, but harder for the
programmer.
It involves adding keywords to the compiled language so that programmers

can write their code to use vector data types, as described in the
AltiVec
Technology Programming Interface Manual and illustrated in my second
code
fragment.  This can give much better performing-code than implicit
vector
code, since the human coder has a better idea of what's "safe" and what
isn't than does the compiler, but it involves more work (training and
ap-
plication of intelligence) on the programmer's part, makes the
conversion
of old code to altivec-utilizing code arduous, limits your ability to
support the code (ie, a programmer who is not trained in using the
vector
language extensions cannot maintain source code that uses it, and
automated
quality assurance tools which need to parse the sources will choke and
die
on the extensions, making them unusable until the company selling the
tool
can be coaxed into rewriting the tool to support the extension), and
opens
the possibility of having multiple, mutually-incompatible
implementations
of compilers which extend the language, damaging portability of code and

limiting one's options for migrating between development environments.

There's another way an application can support altivec, and that's by
using altivec-enabled libraries.  Libraries are collections of functions

which are used in a variety of tasks.  For instance, there is a library
function for calculating the length of a string, or for converting a
tiff-formatted image into a jpeg-formatted image.  The programmers just
call functions from these libraries so they do not have to write them
themselves.  The libraries are then "linked" into their code either when

the application is compiled from source code into an executable binary
(static linking) or when the application is loaded from disk on the end-

user's computer and executed (dynamic linking).  In either case, if a
library gets replaced with another library that "looks" the same (ie,
same
function names, same arguments, etc) but uses completely different code
to
get its task done, then the application will happily use the new library

without any changes having to be made to the non-library part of the
code.

---end excerpted file---





More information about the Mol-evol mailing list