Title :
Approaching a machine-application bound in delivered performance on scientific code
Author :
Mangione-Smith, William H. ; Shih, T.-P. ; Abraham, Santosh G. ; Davidson, Edward S.
Author_Institution :
Motorola, Schaumberg, IL, USA
fDate :
8/1/1993 12:00:00 AM
Abstract :
A performance bounding methodology that explains the performance of loop-dominated scientific applications on particular systems is presented. The throughput of key hardware units that are common bottlenecks in concurrent machines is modeled. A workload characterization is proposed, and upper bounds on the performance of specific machine-workload pairs are derived. Comparing delivered performance with bounds focuses attention on areas for improvement and indicates how much improvement might be attainable. A detailed analysis and performance improvement effort for the IBM RS/6000 produced an average lower bound of 1.27 clocks per floating-point operation (CPF), whereas machine peak performance is 0.5 CPF and the V2.01 Fortran compiler attains only 2.43 CPF. Code improvements in this study have achieved 1.36 CPF, increasing the harmonic mean steady-state inner loop performance to 97.6% of the MFLOPS bound. Subsequently, the V2.02 compiler achieved 1.75 CPF, and 1.60 with carefully chosen preprocessing
Keywords :
natural sciences computing; parallel machines; performance evaluation; program compilers; program testing; CPF; Fortran compiler; IBM RS/6000; MFLOPS bound; V2 01; V2 02; concurrent machines; floating-point operation; harmonic mean steady-state inner loop performance; loop-dominated scientific applications; machine peak performance; machine-application bound; machine-workload pairs; performance bounding methodology; preprocessing; scientific code; workload characterization; Application software; Bandwidth; Clocks; Delay; Hardware; Kernel; Performance analysis; Registers; Throughput; Upper bound;
Journal_Title :
Proceedings of the IEEE