TFP benchmarks

Mark A. Saper saper at UMICH.EDU
Tue Jun 21 13:23:48 EST 1994

Some of you may be interested in this.  This note is from SGI.   It would be 
nice to see the test done with X-PLOR as that is mostly number crunching rather 
than I/O.

              SGI Power Challenge Performance on
           Howard Hughes Medical Institute Benchmarks
                         June 7, 1994

SGI recently performed preliminary benchmark measurements of the
Small and Large PROTIN and PROLSQ benchmarks from the Howard Hughes
Medical Institute of Columbia University on the new SGI Power Challenge
system.  These benchmarks were performed using pre-release versions
of compiler, libraries, and operating systems software, so these results
should be interpreted as preliminary and subject to change before the
final release of software for customer shipments.

Attached is the standard Benchmark Submission Form with the timing
We have the following comments to add to this form in interpreting these

First, we have not reported compile times for these codes because we are
currently benchmarking in a cross-compilation environment where the
benchmarks were compiled and built on Challenge machines with R4400
Thus, any compile times in this environment would be misleading and
unrepresentative of compiling programs natively on the Power Challenge.

The elapsed execution times for the small and large PROTIN runs for both
the Challenge with 150 MHz R4400 processors and for the Power Challenge
with 75 MHz TFP processors are as follows:

PROTIN code                Elapsed Times in Seconds
                              Small       Large

Challenge, R4400               .66         7.76
Power Challenge, TFP           .75         8.21

The profiles of these runs shows that almost all of the run time is
spent in I/O routines for performing formatted reads and writes.  The TFP
is slightly slower than the R4400 for two reasons:  (1) formatted I/O
is not dominated by floating-point vector operations that the TFP
architecture was designed to excel at, and  (2) no effort has yet been
put into optimizing the performance of the I/O runtime library of the
Challenge compilers.

The following are the corresponding elapsed times for the execution of
the PROLSQ runs.

PROLSQ code                         Elapsed Times in Seconds
                                        Small       Large

Challenge, 1 R4400 processor           14.45        2646
Challenge, 2 R4400 processors          11.09        1436
Challenge, 4 R4400 processors           8.57         768

Power Challenge, 1 TFP processor        9.08         700
Power Challenge, 2 TFP processors       8.97         458
Power Challenge, 4 TFP processors       8.38         297

The Small PROLSQ run is also dominated by the time spent in formatted I/O,
especially in the TFP runs.  The TFP has accelerated the computation in
the Small PROLSQ run to such a degree that only 50% of the run time is
spent in the PROLSQ program itself, with the other 50% spent in I/O

The large PROLSQ run is the only test from this suite that performs enough
computation to demonstrate the capabilities of the TFP processor.  Here,
we see that the single processor TFP run is 3.8 times faster than the
single processor R4400 run, and the 4-processor TFP run is 2.6 times
than the 4-processor R4400 run.  The parallel speedups are not as high
on the Power Challenge as on the Challenge because the TFP processor has
significantly accelerated the computational work inside of the parallel
regions, but has not accelerated formatted I/O and parallel
overhead, both of which are about the same performance in the Power
as in the Challenge.

Dr. Mark A. Saper                                      saper at
Biophysics Research Division                     Phone: (313) 764-3353    
University of Michigan                             FAX: (313) 764-3323          
930 N. University Ave., Ann Arbor, MI 48109-1055

More information about the Xtal-log mailing list