DocumentCode :
3571349
Title :
Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators
Author :
Inagaki, Yoshikazu ; Takamaeda-Yamazaki, Shinya ; Jun Yao ; Nakashima, Yasuhiko
Author_Institution :
Fujitsu Comput. Technol. Ltd., Kawasaki, Japan
fYear :
2014
Firstpage :
388
Lastpage :
393
Abstract :
EMAX: Energy-aware Multimode Accelerator Extension is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data and image processing and also to achieve low power consumption. However, before mapping algorithms on the accelerator, application developers should have sufficient knowledge of the hardware organization and specially designed instructions. They will, furthermore, need to make significant efforts to tune the code for improving execution efficiency, in the case that no well-designed compiler or library is available. To address this problem, we focus especially on library support for the stencil (nearest-neighbor) computations, which represent a class of algorithms popularly used in many partial differential equation (PDE) solvers. In this research, we take up the following topics: (1) System configuration, features and mnemonics of EMAX, (2) Instruction mapping techniques that can reduce the amount of data to be read from the main memory, (3) Performance evaluation of the library for PDE solvers. With the features of the library that can reuse the local data across the outer loop iterations and can map many instructions by unrolling outer loops, the amount of data to be read from main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library was found capable of reducing 23% of the execution time compared with a general purpose processor.
Keywords :
distributed processing; partial differential equations; storage management; 3D-stencil library; Big Data; EMAX features; EMAX mnemonics; EMAX system configuration; PDE solver; distributed memory array accelerators; distributed single-port local memory; energy-aware multimode accelerator extension; general purpose processor; hardware organization; image processing; instruction mapping techniques; library support; partial differential equation; performance evaluation; ring-formed interconnection; scientific computation; stencil computation; Arrays; Computers; Data communication; Kernel; Libraries; Memory management; Optimization; CGRA; accelerator; coarse grained reconfigurable architecture; library; optimization; stencil;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing and Networking (CANDAR), 2014 Second International Symposium on
Type :
conf
DOI :
10.1109/CANDAR.2014.100
Filename :
7052215
Link To Document :
بازگشت