DocumentCode :
703825
Title :
Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators
Author :
Peemen, Maurice ; Mesman, Bart ; Corporaal, Henk
Author_Institution :
Dept. of Electr. Eng., Eindhoven Univ. of Technol., Eindhoven, Netherlands
fYear :
2015
fDate :
9-13 March 2015
Firstpage :
169
Lastpage :
174
Abstract :
The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is often dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for inter-tile data reuse with loop transformations like interchange and tiling. We focus on embedded accelerators that can be used in a multi-accelerator System on Chip (SoC), so performance, area, and energy are key in this exploration. 1) On three common embedded applications in the image/video processing domain (demosaicing, block matching, object detection), we show that our methodology reduces data movement up to 2.1x compared to the best case of intra-tile optimization. 2) We demonstrate that our small accelerators (1-3% FPGA resources) can boost a simple MicroBlaze soft-core to the performance level of a high-end Intel-i7 processor.
Keywords :
buffer storage; circuit optimisation; embedded systems; graphics processing units; high level synthesis; multiprocessor interconnection networks; system-on-chip; MicroBlaze soft core; bandwidth constrained embedded accelerator; buffer usage; complex scaling problem; data transfer; embedded applications; high-end Intel-i7 processor; high-level synthesis; inter-tile reuse optimization; interconnect resource; loop transformation; multiaccelerator SoC; nested loop optimization; system on chip; Arrays; Bismuth; Cost function; Data transfer; Schedules; Steady-state;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
Conference_Location :
Grenoble
Print_ISBN :
978-3-9815-3704-8
Type :
conf
Filename :
7092377
Link To Document :
بازگشت