DocumentCode :
1820028
Title :
Skeleton-based automatic parallelization of image processing algorithms for GPUs
Author :
Nugteren, Cedric ; Corporaal, Henk ; Mesman, Bart
Author_Institution :
Eindhoven Univ. of Technol., Eindhoven, Netherlands
fYear :
2011
fDate :
18-21 July 2011
Firstpage :
25
Lastpage :
32
Abstract :
Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image processing algorithms. Skeletonization separates the structure of a parallel computation from the algorithm´s functionality, enabling efficient implementations without requiring architecture knowledge from the programmer. We define a number of skeleton classes, each enabling GPU specific parallelization techniques and optimizations, including automatic thread creation, on-chip memory usage and memory coalescing. Recently, similar skeletonization techniques have been applied to GPUs. Our work uses domain specific skeletons and a finer-grained classification of algorithms. Comparing skeleton-based parallelization to existing GPU code generators in general, we potentially achieve a higher hardware efficiency by enabling algorithm restructuring through skeletons. In a set of benchmarks, we show that the presented skeleton-based approach generates highly optimized code, achieving high data throughput. Additionally, we show that the automatically generated code performs close or equal to manually mapped and optimized code. We conclude that skeleton-based parallelization for GPUs is promising, but we do believe that future research must focus on the identification of a finer-grained and complete classification.
Keywords :
computer graphic equipment; coprocessors; image classification; parallel algorithms; program compilers; storage management; GPU code generator; GPU specific parallelization technique; automatic thread creation; code optimization; code-annotation; data throughput; finer-grained classification; graphics processing unit; high performance computing; image processing algorithm; memory coalescing; on-chip memory usage; parallel algorithm; sequential code map; skeleton-based automatic parallelization; Graphics processing unit; Hardware; Image processing; Instruction sets; Kernel; Skeleton; System-on-a-chip;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Embedded Computer Systems (SAMOS), 2011 International Conference on
Conference_Location :
Samos
Print_ISBN :
978-1-4577-0802-2
Electronic_ISBN :
978-1-4577-0801-5
Type :
conf
DOI :
10.1109/SAMOS.2011.6045441
Filename :
6045441
Link To Document :
بازگشت