Optimizing CNS

Scott Ware ware at os.pharm.nwu.edu
Mon Mar 1 17:58:04 EST 1999


Steffen Graether (steffen at protein.biochem.queensu.ca) wrote:

> Could someone suggest how I could optimize cns on a linux pc running
> redhat 5.2? I find the three-fold execution time difference between our
> overused R10000-250 MHz Octane and underused PentiumII-333 Mhz PC rather
> large. 

It is possible to speed up CNS running under Linux/x86 by using more
aggressive optimization in the compilation stage.  Compiling CNS
using fort77+f2c+gcc with the following optimization flags results in an
executable that is 20-40% faster than an executable that is created with
the default makefile:

F77OPT = -O3 -fomit-frame-pointer -ffast-math -malign-double

The -fomit-frame-pointer option tells the compiler to omit the frame
pointer if it is unnecessary, freeing a register for other operations. 
This can help quite a bit when compiling code for the register-poor x86
architecture, although it does prevent the use of a debugger.  

The -ffast-math option allows the compiler to perform some optimizations
that may violate the IEEE floating point rules.  I haven't had any
problems running code compiled with -ffast-math, but it can introduce
different rounding errors (not necessarily larger - just different) than
code compiled without this option.  Test executables created with 
-ffast-math thoroughly.

Pentium and P6-class processors expect doubles to be aligned to an 8-byte
boundary, and the -malign-double option tells the compiler to do so. 

The performance of code compiled with recent egcs g77 releases still seems
to lag behind the performance of code compiled with f2c and the gcc
compiler included in the same egcs release.  Hopefully, Intel's recent
decision to participate in optimizing the egcs compilers will result in 
compilers that produce faster code.

The modifications needed to compile CNS using the Portland Group's PGF77
compiler are trivial, and the results are very good.  PGF77 produces an
executable that is about 20% faster than f2c+gcc with the above
optimizations. 

As David Konerding mentioned, CNS uses the highly optimized
complib.sgimath math libraries for performing FFTs on SGI systems.  These
optimized libraries are much faster than the FFTPACK Fortran routines that
are included with CNS.  A group at ORNL has ported Intel's optimized FFT
libraries from the ASCI RED project to Linux; however, these libraries are
not yet available.  I plan to adapt CNS for use with these libraries
when they are.

Of course, there aren't any optimizations that will boost a PII/333 to
considerably more than half of the general FP performance of an Octane
R10k/250.  The PII can compete with an R10k in an Indigo2 or O2, but the
memory subsystem of the Octane and Origin systems is much better.


--
Scott Ware                   NUMS-MPBC Macromolecular Crystallography
       303 East Chicago Avenue, Ward 8-264, Chicago, IL 60611
     PGP Public Key:  http://xtal.pharm.nwu.edu/~ware/public.txt




More information about the X-plor mailing list