DocumentCode :
3757196
Title :
A Distributed Memory Based Embedded CGRA for Accelerating Stencil Computations
Author :
Shohei Takeuchi;Yuttakon Yuttakonkit;Shinya Takamaeda-Yamazaki;Yasuhiko Nakashima
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. &
fYear :
2015
Firstpage :
385
Lastpage :
391
Abstract :
Stencil computation is one of the basic but important operation patterns for various applications, such as image processing. Various GPU-based and application-specific hardware approaches have been recently proposed. However, available absolute energy capacity and hardware size are limited in embedded systems. Therefore, energy efficient, small footprint, and high performance accelerator is necessary for constructing an intelligent computation platform. We develop an embedded CGRA accelerator with distributed on-chip memory blocks for both energy-and memory-bandwidthefficient stencil computation. In this paper, we implemented a real LSI and its FPGA based evaluation platform by using Xilinx Zynq and Debian Linux. The evaluation result shows that the accelerator achieves 2.5x higher performance and 2.3x lower energy consumption, compared to ARM core with Zynq. We then estimated the performance and energy efficiency of the accelerator. The estimation result shows that the accelerator manufactured in 28nm process achieves 1.61x better energy efficiency than the mobile GPU.
Keywords :
"Hardware","Registers","Graphics processing units","Energy consumption","Jacobian matrices","Large scale integration","Computer architecture"
Publisher :
ieee
Conference_Titel :
Computing and Networking (CANDAR), 2015 Third International Symposium on
Electronic_ISBN :
2379-1896
Type :
conf
DOI :
10.1109/CANDAR.2015.110
Filename :
7424743
Link To Document :
بازگشت