DocumentCode :
22491
Title :
A Modular Shared L2 Memory Design for 3-D Integration
Author :
Azarkhish, Erfan ; Rossi, Davide ; Loi, Igor ; Benini, Luca
Author_Institution :
Dept. of Electr., Electron. & Inf. Eng., Univ. of Bologna, Bologna, Italy
Volume :
23
Issue :
8
fYear :
2015
fDate :
Aug. 2015
Firstpage :
1485
Lastpage :
1498
Abstract :
Large required size, and tolerance to latency and variations in memory access time make L2 memory a suitable option for 3-D integration. In this paper, we present a synthesizable 3-D-stackable L2 memory IP component, which can be attached to a cluster-based multicore platform through its network-on-chip interfaces offering high-bandwidth memory access with low average latency. Our design implements a scalable 3-D-nonuniform memory access (NUMA) architecture based on low latency logarithmic interconnects, which allows stacking of multiple identical memory dies (MDs), supports multiple outstanding transactions, and achieves high clock frequencies due to its highly pipelined nature. We implemented our design with STMicroelectronics CMOS-28-nm low-power technology and obtained a clock frequency of 500 MHz (limited by the access time of the memory arrays, whereas its logic components can operate up to 1 GHz), up to eight stacked dies (4 MB) with a memory density loss of 9%. Benchmark simulation results demonstrate that the addition of 3-D-NUMA to a multicluster system can lead to an average performance boost of 34%. Furthermore, experiments and estimations confirm that 3-D-NUMA is energy and power efficient (38% power reduction due to an architectural clock gating scheme), temperature friendly (over 40°C temperature reduction), and has unique features suitable for low-cost manufacturing (2.3× cost reduction due to identical MD layouts). Finally, 22% yield improvement is achievable in 3-D-NUMA compared with its 2-D counterparts, using the state of the art through-silicon-via technologies.
Keywords :
CMOS memory circuits; integrated circuit design; integrated circuit interconnections; integrated memory circuits; network-on-chip; three-dimensional integrated circuits; 3D integration; 3D nonuniform memory access architecture; 3D-stackable L2 memory IP component; L2 memory design; NUMA architecture; STMicroelectronics CMOS low-power technology; clock frequency; cluster-based multicore platform; frequency 500 MHz; low latency logarithmic interconnects; memory access time; memory density loss; memory dies; multicluster system; network-on-chip interfaces; size 28 nm; through-silicon-via technologies; Clocks; IP networks; Pipeline processing; Random access memory; Stacking; System-on-chip; Through-silicon vias; 3-D integration; nonuniform memory access (NUMA); physical implementation; tightly coupled data memory;
fLanguage :
English
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-8210
Type :
jour
DOI :
10.1109/TVLSI.2014.2340013
Filename :
6876020
Link To Document :
بازگشت