DocumentCode :
1758715
Title :
A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects
Author :
Azarkhish, Erfan ; Loi, Igor ; Benini, Luca
Author_Institution :
DEI, Univ. of Bologna, Bologna, Italy
Volume :
7
Issue :
5
fYear :
2013
fDate :
41518
Firstpage :
191
Lastpage :
199
Abstract :
Shared tightly coupled data memories are key architectural elements for building multi-core clusters in programmable accelerators and embedded systems, as they provide a convenient shared memory abstraction while avoiding cache coherence overheads. The performance of these memories largely depends on the architecture of the interconnect used between processing elements (PEs) and memory banks. The advent of three-dimensional (3D) technology has provided new opportunities to increase design modularity and reduce latency and manufacturing cost. In this study, the authors propose two 3D network architectures: C-logarithmic interconnect (LIN) and Distributed logarithmic interconnect (D-LIN) (designed in synthesisable RTL), which allow modular stacking of multiple L1 memory dies over a multi-core cluster with a limited number of PEs. The authors have used two through-silicon-via technologies: the state-of-the-art micro-bumps and the promising and dense Cu-Cu direct bonding. The overhead of electrostatic discharge protection circuits has been considered, as well. Architectural simulation results demonstrate that, in processor-to-L1-memory context, C-LIN and D-LIN perform significantly better than traditional network-on-chips and simple time-division multiplexing buses. Furthermore, post-layout results show that the proposed 3D architectures achieve comparable speed against their 2D counterparts, whereas enabling modularity: from 256 kB to 2 MB L1 memory configurations with a single mask set.
Keywords :
cost reduction; electrostatic discharge; embedded systems; network-on-chip; shared memory systems; time division multiplexing; 3D architectures; 3D network architectures; C-LIN; C-logarithmic interconnect; D-LIN; PE; dense Cu-Cu direct bonding; design modularity; electrostatic discharge protection circuits; embedded systems; manufacturing cost reduction; memory banks; modular stacking; multicore clusters; network-on-chip; processing elements; processor-to-L1-memory context; programmable accelerators; shared memory abstraction; state-of-the-art microbumps; three-dimensional stacking; tightly coupled data memories; time-division multiplexing bus;
fLanguage :
English
Journal_Title :
Computers & Digital Techniques, IET
Publisher :
iet
ISSN :
1751-8601
Type :
jour
DOI :
10.1049/iet-cdt.2013.0031
Filename :
6584851
Link To Document :
بازگشت