DocumentCode :
55653
Title :
MAHA: An Energy-Efficient Malleable Hardware Accelerator for Data-Intensive Applications
Author :
Paul, Somnath ; Krishna, Aswin ; Wenchao Qian ; Karam, Robert ; Bhunia, Swarup
Author_Institution :
Intel Corp., Santa Clara, CA, USA
Volume :
23
Issue :
6
fYear :
2015
fDate :
Jun-15
Firstpage :
1005
Lastpage :
1016
Abstract :
For data-intensive applications, energy expended in on-chip computation constitutes only a small fraction of the total energy consumption. The primary contribution comes from transporting data between off-chip memory and on-chip computing elements-a limitation referred to as the Von-Neumann bottleneck. In such a scenario, improving the compute energy through parallel processing or on-chip hardware acceleration brings minor improvements to the total energy requirement of the system. We note that an effective solution to mitigate the Von-Neumann bottleneck is to develop a framework that enables computing in off-chip nonvolatile memory arrays, where the data reside permanently. In this paper, we present a malleable hardware (MAHA) reconfigurable framework that modifies nonvolatile CMOS-compatible flash memory array for on-demand reconfigurable computing. MAHA is a spatio-temporal mixed-granular hardware reconfigurable framework, which utilizes the memory for storage as well as lookup table-based computation (hence malleable) and uses a low-overhead hierarchical interconnect fabric for communication between processing elements. A detailed design of the malleable hardware together with a comprehensive application mapping flow is presented. Design overheads carefully estimated at the 45-nm technology node indicate that for a set of common kernels, MAHA achieves a 91X improvement in energy efficiency over a software-only solution with negligible impact on memory performance in normal mode. The proposed design changes incur only 6% memory area overhead.
Keywords :
energy conservation; flash memories; power aware computing; MAHA reconfigurable framework; Von-Neumann bottleneck; complimentary metal oxide semiconductors; data-intensive applications; energy consumption; energy-efficient malleable hardware accelerator; hierarchical interconnect fabric; lookup table-based computation; nonvolatile CMOS-compatible flash memory array; off-chip memory element; off-chip nonvolatile memory arrays; on-chip computation; on-chip computing element; on-chip hardware acceleration; parallel processing; processing elements; spatio-temporal mixed-granular hardware reconfigurable framework; Arrays; Ash; Hardware; Nonvolatile memory; System-on-chip; Table lookup; Energy efficiency; NAND flash; Von-Neumann bottleneck; in-memory computing; nand flash;
fLanguage :
English
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-8210
Type :
jour
DOI :
10.1109/TVLSI.2014.2332538
Filename :
6891374
Link To Document :
بازگشت