DocumentCode :
3244188
Title :
Optimal Data Distribution for Versatile Finite Impulse Response Filtering on Next-Generation Graphics Hardware Using CUDA
Author :
Goorts, Patrik ; Rogmans, Sammy ; Bekaert, Philippe
Author_Institution :
Expertise centre for Digital Media, Univ. - tUL - IBBT, Diepenbeek, Belgium
fYear :
2009
fDate :
8-11 Dec. 2009
Firstpage :
300
Lastpage :
307
Abstract :
In this paper, we investigate discrete finite impulse response (FIR) filtering of images, while harnessing the powerful computational resources of next-generation GPUs. These novel platforms exhibit a massive data parallel architecture with an advanced SIMT execution model and thread management, to enable designers to better cope with the infamous memory wall, i.e. the growing gap between the cost of data communication and computational processing. However, the concerning platforms still have hard constraints that prevent trivial optimization of convolution filtering. Although automatic (compiler) optimization is available, we investigate and explain the speedup potential considering manual intervention, given the context of FIR kernels. Furthermore, we present multiple convolution implementation techniques that are able to cope with the hard platform constraints in different situations, while still being able to optimize the implementation to the underlying architecture. Utilizing the acquired insights, a view is given on the impact for possible optimization when loosening these hard constraints in the near future.
Keywords :
FIR filters; computer graphics; coprocessors; parallel architectures; CUDA; FIR kernel; SIMT execution model; compiler optimization; finite impulse response filtering; massive data parallel architecture; multiple convolution; next-generation graphics hardware; optimal data distribution; thread management; Constraint optimization; Convolution; Costs; Filtering; Finite impulse response filter; Graphics; Hardware; Memory management; Parallel architectures; Yarn; CUDA; FIR; convolution; data distribution;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on
Conference_Location :
Shenzhen
ISSN :
1521-9097
Print_ISBN :
978-1-4244-5788-5
Type :
conf
DOI :
10.1109/ICPADS.2009.79
Filename :
5395277
Link To Document :
بازگشت