DocumentCode :
1218022
Title :
Efficient Utilization of SIMD Extensions
Author :
Franchetti, Franz ; Kral, Stefan ; Lorenz, Juergen ; Ueberhuber, Christoph W.
Author_Institution :
Vienna Univ. of Technol., Austria
Volume :
93
Issue :
2
fYear :
2005
Firstpage :
409
Lastpage :
425
Abstract :
This paper targets automatic performance tuning of numerical kernels in the presence of multilayered memory hierarchies and single-instruction, multiple-data (SIMD) parallelism. The studied SIMD instruction set extensions include Intel´s SSE family, AMD´s 3DNow!, Motorola´s AltiVec, and IBM´s BlueGene/L SIMD instructions. FFTW, ATLAS, and SPIRAL demonstrate that near-optimal performance of numerical kernels across a variety of modern computers featuring deep memory hierarchies can be achieved only by means of automatic performance tuning. These software packages generate and optimize ANSI C code and feed it into the target machine´s general-purpose C compiler to maintain portability. The scalar C code produced by performance tuning systems poses a severe challenge for vectorizing compilers. The particular code structure hampers automatic vectorization and, thus, inhibits satisfactory performance on processors featuring short vector extensions. This paper describes special-purpose compiler technology that supports automatic performance tuning on machines with vector instructions. The work described includes: 1) symbolic vectorization of digital signal processing transforms; 2) straight-line code vectorization for numerical kernels; and 3) compiler back ends for straight-line code with vector instructions. Methods from all three areas were combined with FFTW, SPIRAL, and ATLAS to optimize both for memory hierarchy and vector instructions. Experiments show that the presented methods lead to substantial speedups (up to 1.8 for two-way and 3.3 for four-way vector extensions) over the best scalar C codes generated by the original systems as well as roughly matching the performance of hand-tuned vendor libraries.
Keywords :
instruction sets; mathematics computing; operating system kernels; parallel programming; parallelising compilers; software libraries; software packages; software portability; ANSI C code; Abbreviated Test Language for Avionic Systems; American National Standards Institute; C compiler; SIMD extensions; automatic performance tuning; automatic vectorization; compiler back ends; digital signal processing transforms; hand tuned vendor libraries; multilayered memory hierarchies; numerical kernels; performance tuning systems; short vector extensions; single instruction multiple data parallelism; software packages; software portability; special purpose compiler technology; straight line code vectorization; vector instructions; vectorizing compilers; Boosting; Computer aided instruction; Computer applications; Concurrent computing; Digital signal processing; Kernel; Parallel processing; Registers; Signal processing algorithms; Spirals; Automatic vectorization; digital signal processing (DSP); fast Fourier transform (FFT); short vector single instruction, multiple data (SIMD); symbolic vectorization;
fLanguage :
English
Journal_Title :
Proceedings of the IEEE
Publisher :
ieee
ISSN :
0018-9219
Type :
jour
DOI :
10.1109/JPROC.2004.840491
Filename :
1386659
Link To Document :
بازگشت