Title :
High performance finite impulse response filter on graphics processors
Author :
Rongxin Qu ; Chunhong Zhang ; Jinkuan Wang ; Yun Wei
Author_Institution :
Sch. of Comput. Eng., Northeastern Univ., Qin Huang Dao, China
Abstract :
A high performance FIR filtering algorithm on the GPU is presented based on the traditional overlapped-save method for the fast FIR filter. This algorithm exploits a symmetric segmentation approach to partition the input data into the blocks for processing. And this approach can optimize the GPU memory access and minimize the branch divergence of the warp. In addition, a zero-padding method, extending the length of the short time-domain coefficients of the FIR filter to the best size which the FFT library running on the GPU can obtain the best performance, is utilized to improve the algorithm´s performance gain for the short tap length of the FIR filter. The throughput of this algorithm can achieve over 600M samples per second throughput for the host-memory to host-memory on the NVIDIA Tesla M2090 with typical performance improvements of 4 to 6 times over Intel IPP for large chunk size.
Keywords :
FIR filters; graphics processing units; FFT library; FIR filtering; GPU memory access; finite impulse response filter; graphics processors; overlapped save method; performance gain; short time-domain coefficients; symmetric segmentation; zero padding method; Algorithm design and analysis; Computer architecture; Finite impulse response filter; Graphics processing units; Instruction sets; Throughput;
Conference_Titel :
Intelligent Control and Information Processing (ICICIP), 2012 Third International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4577-2144-1
DOI :
10.1109/ICICIP.2012.6391489