DocumentCode :
2576424
Title :
Performance optimisations of the NPB FT kernel by special-purpose unroller
Author :
Getov, Vladimir ; Wei, Yuan ; Carter, Larry ; Kang Su Garlin
Author_Institution :
Sch. of Comput. Sci., Westminster Univ., Harrow, UK
fYear :
1999
fDate :
3-5 Feb 1999
Firstpage :
84
Lastpage :
88
Abstract :
The fast Fourier transform (FFT) is the cornerstone of many supercomputer applications and therefore needs careful performance tuning. Most often, however the real performance of the FFT implementations is far below the acceptable figures. In this paper we explore several strategies for performance optimisations of the FFT computation, such as enhancing instruction-level parallelism, loop merging, and reducing the memory loads and stores by using a special-purpose automatic loop unroller. Our approach is based on the principle of complete unrolling which we apply to modify the FT kernel of the NAS Parallel Benchmarks (NPB). In experiments on two different IBM SP2 platforms, our automatically generated unrolled FFT subroutine is shown to improve the performance between 40% and 53% in comparison with the original code. Further the execution time of the entire 3-D FFT mega-step of the benchmark is faster than when calls to a similar FFT subroutine from the vendor-optimised PESSL numerical library are used. Preliminary results suggest that the completely unrolled code also outperforms FFTW another high-performance FFT package. Finally, our approach for automatic generation of moderately optimised but specialised codes requires only a modest amount of programming effort
Keywords :
fast Fourier transforms; instruction sets; optimisation; parallel processing; performance evaluation; FFT implementations; NPB FT kernel; fast Fourier transform; instruction-level parallelism; loop merging; performance optimisations; performance tuning; special-purpose unroller; Algorithms; Computer aided instruction; Concurrent computing; Fast Fourier transforms; Kernel; Libraries; Merging; Optimization; Parallel processing; Supercomputers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing, 1999. PDP '99. Proceedings of the Seventh Euromicro Workshop on
Conference_Location :
Funchal
Print_ISBN :
0-7695-0059-5
Type :
conf
DOI :
10.1109/EMPDP.1999.746649
Filename :
746649
Link To Document :
بازگشت