• DocumentCode
    22491
  • Title

    A Modular Shared L2 Memory Design for 3-D Integration

  • Author

    Azarkhish, Erfan ; Rossi, Davide ; Loi, Igor ; Benini, Luca

  • Author_Institution
    Dept. of Electr., Electron. & Inf. Eng., Univ. of Bologna, Bologna, Italy
  • Volume
    23
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 2015
  • Firstpage
    1485
  • Lastpage
    1498
  • Abstract
    Large required size, and tolerance to latency and variations in memory access time make L2 memory a suitable option for 3-D integration. In this paper, we present a synthesizable 3-D-stackable L2 memory IP component, which can be attached to a cluster-based multicore platform through its network-on-chip interfaces offering high-bandwidth memory access with low average latency. Our design implements a scalable 3-D-nonuniform memory access (NUMA) architecture based on low latency logarithmic interconnects, which allows stacking of multiple identical memory dies (MDs), supports multiple outstanding transactions, and achieves high clock frequencies due to its highly pipelined nature. We implemented our design with STMicroelectronics CMOS-28-nm low-power technology and obtained a clock frequency of 500 MHz (limited by the access time of the memory arrays, whereas its logic components can operate up to 1 GHz), up to eight stacked dies (4 MB) with a memory density loss of 9%. Benchmark simulation results demonstrate that the addition of 3-D-NUMA to a multicluster system can lead to an average performance boost of 34%. Furthermore, experiments and estimations confirm that 3-D-NUMA is energy and power efficient (38% power reduction due to an architectural clock gating scheme), temperature friendly (over 40°C temperature reduction), and has unique features suitable for low-cost manufacturing (2.3× cost reduction due to identical MD layouts). Finally, 22% yield improvement is achievable in 3-D-NUMA compared with its 2-D counterparts, using the state of the art through-silicon-via technologies.
  • Keywords
    CMOS memory circuits; integrated circuit design; integrated circuit interconnections; integrated memory circuits; network-on-chip; three-dimensional integrated circuits; 3D integration; 3D nonuniform memory access architecture; 3D-stackable L2 memory IP component; L2 memory design; NUMA architecture; STMicroelectronics CMOS low-power technology; clock frequency; cluster-based multicore platform; frequency 500 MHz; low latency logarithmic interconnects; memory access time; memory density loss; memory dies; multicluster system; network-on-chip interfaces; size 28 nm; through-silicon-via technologies; Clocks; IP networks; Pipeline processing; Random access memory; Stacking; System-on-chip; Through-silicon vias; 3-D integration; nonuniform memory access (NUMA); physical implementation; tightly coupled data memory;
  • fLanguage
    English
  • Journal_Title
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-8210
  • Type

    jour

  • DOI
    10.1109/TVLSI.2014.2340013
  • Filename
    6876020