DocumentCode :
2963087
Title :
Design evaluation of OpenCL compiler framework for Coarse-Grained Reconfigurable Arrays
Author :
Hee-Seok Kim ; Minwook Ahn ; Stratton, J.A. ; Hwu, W.W.
Author_Institution :
Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear :
2012
fDate :
10-12 Dec. 2012
Firstpage :
313
Lastpage :
320
Abstract :
OpenCL is undoubtedly becoming one of the most popular parallel programming languages as it provides a standardized and portable programming model. However, adopting OpenCL for Coarse-Grained Reconfigurable Arrays (CGRA) is challenging due to divergent architecture capability compared to GPUs. In particular, CGRAs are designed to accelerate loop execution by software pipelining on a grid of functional units exploiting instruction-level parallelism. This is vastly different from a GPU in that it executes data parallel kernels using a large number of parallel threads. Therefore, an OpenCL compiler and runtime for CGRAs must map the threaded parallel programming model to a loop-parallel execution model so that the architecture can best utilize its resources. In this paper, we propose and evaluate a design for an OpenCL compiler framework for CGRAs. The proposed design is composed of a serializer and post optimizer. The serializer transforms parallel execution of work-items to an equivalent loop-based iterative execution in order to avoid expensive multithreading on CGRAs. The resulting code is further optimized by the post optimizer to maximize the coverage of software-pipelinable innermost loops. In order to achieve the goal, various loop-level optimizations can take place in the post optimizer using the loops introduced by the serializer for iterative execution of OpenCL kernels. We provide an analysis of the propose framework from a set of well-studied standard OpenCL kernels by comparing performance of various implementations of benchmarks.
Keywords :
multi-threading; optimising compilers; parallel languages; pipeline processing; reconfigurable architectures; software performance evaluation; CGRA; OpenCL compiler framework; coarse-grained reconfigurable arrays; data execution; design evaluation; functional unit grid; innermost loop coverage maximization; instruction-level parallelism; iterative OpenCL kernel execution; loop-based iterative execution; loop-level optimizations; loop-parallel execution model; multithreading; parallel kernels; parallel programming languages; parallel threads; portable programming model; post optimizer; serializer; software pipelining; standardized programming model; threaded parallel programming model; Computer architecture; Graphics processing units; Hardware; Kernel; Optimization; Programming; CGRA; Coarse-Grained Reconfigurable Arrays; GPU; OpenCL; RP; SRP; Samsung Reconfigurable Processor;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Field-Programmable Technology (FPT), 2012 International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4673-2846-3
Electronic_ISBN :
978-1-4673-2844-9
Type :
conf
DOI :
10.1109/FPT.2012.6412155
Filename :
6412155
Link To Document :
بازگشت