Title :
Deferring accelerator offloading decisions to application runtime
Author :
Vaz, Gavin ; Riebler, Heinrich ; Kenter, Tobias ; Plessl, Christian
Author_Institution :
Dept. of Comput. Sci., Univ. of Paderborn, Paderborn, Germany
Abstract :
Reconfigurable architectures provide an opportunity to accelerate a wide range of applications, frequently by exploiting data-parallelism, where the same operations are homogeneously executed on a (large) set of data. However, when the sequential code is executed on a host CPU and only data-parallel loops are executed on an FPGA coprocessor, a sufficiently large number of loop iterations (trip counts) is required, such that the control- and data-transfer overheads to the coprocessor can be amortized. However, the trip count of large data-parallel loops is frequently not known at compile time, but only at runtime just before entering a loop. Therefore, we propose to generate code both for the CPU and the coprocessor, and to defer the decision where to execute the appropriate code to the runtime of the application when the trip count of the loop can be determined just at runtime. We demonstrate how an LLVM compiler based toolflow can automatically insert appropriate decision blocks into the application code. Analyzing popular benchmark suites, we show that this kind of runtime decisions is often applicable. The practical feasibility of our approach is demonstrated by a toolflow that automatically identifies loops suitable for vectorization and generates code for the FPGA coprocessor of a Convey HC-1. The toolflow adds decisions based on a comparison of the runtime-computed trip counts to thresholds for specific loops and also includes support to move just the required data to the coprocessor. We evaluate the integrated toolflow with characteristic loops executed on different input data sizes.
Keywords :
coprocessors; field programmable gate arrays; program compilers; program control structures; reconfigurable architectures; CPU; Convey HC-1; FPGA coprocessor; LLVM compiler based toolflow; accelerator offloading decision; application runtime; compile time; control-transfer overhead; data-parallelism; data-transfer overhead; decision block; large data-parallel loop; loop iteration; reconfigurable architecture; runtime-computed trip count; sequential code; Benchmark testing; Coprocessors; Field programmable gate arrays; Libraries; Memory management; Runtime; Vectors;
Conference_Titel :
ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4799-5943-3
DOI :
10.1109/ReConFig.2014.7032509