مرکز منطقه ای اطلاع رساني علوم و فناوري - Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators

DocumentCode :

703825

Title :

Inter-tile reuse optimization applied to bandwidth constrained embedded accelerators

Author :

Peemen, Maurice ; Mesman, Bart ; Corporaal, Henk

Author_Institution :

Dept. of Electr. Eng., Eindhoven Univ. of Technol., Eindhoven, Netherlands

fYear :

2015

fDate :

9-13 March 2015

Firstpage :

169

Lastpage :

174

Abstract :

The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is often dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for inter-tile data reuse with loop transformations like interchange and tiling. We focus on embedded accelerators that can be used in a multi-accelerator System on Chip (SoC), so performance, area, and energy are key in this exploration. 1) On three common embedded applications in the image/video processing domain (demosaicing, block matching, object detection), we show that our methodology reduces data movement up to 2.1x compared to the best case of intra-tile optimization. 2) We demonstrate that our small accelerators (1-3% FPGA resources) can boost a simple MicroBlaze soft-core to the performance level of a high-end Intel-i7 processor.

Keywords :

buffer storage; circuit optimisation; embedded systems; graphics processing units; high level synthesis; multiprocessor interconnection networks; system-on-chip; MicroBlaze soft core; bandwidth constrained embedded accelerator; buffer usage; complex scaling problem; data transfer; embedded applications; high-end Intel-i7 processor; high-level synthesis; inter-tile reuse optimization; interconnect resource; loop transformation; multiaccelerator SoC; nested loop optimization; system on chip; Arrays; Bismuth; Cost function; Data transfer; Schedules; Steady-state;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015

Conference_Location :

Grenoble

Print_ISBN :

978-3-9815-3704-8

Type :

conf

Filename :

7092377

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=703825