• DocumentCode
    692869
  • Title

    Exploring DRAM organizations for energy-efficient and resilient exascale memories

  • Author

    Giridhar, B. ; Cieslak, Michael ; Duggal, Deepankar ; Dreslinski, Ronald ; Hsing Min Chen ; Patti, Robert ; Hold, Betina ; Chakrabarti, Chaitali ; Mudge, Trevor ; Blaauw, D.

  • Author_Institution
    Univ. of Michigan, Ann Arbor, MI, USA
  • fYear
    2013
  • fDate
    17-22 Nov. 2013
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    The power target for exascale supercomputing is 20MW, with about 30% budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implications on fault tolerance have not been considered. We show that addressing reliability and energy is a co-optimization problem involving tradeoffs between error correction cost, access energy and refresh power-reducing the physical page size to decrease access energy increases the energy/area overhead of error resilience. Additionally, power can be reduced by optimizing bitline lengths. The proposed 3D-stacked memory uses a page size of 4kb and consumes 5.1pJ/bit based on simulations with NEK5000 benchmarks. Scaling to 100PB, the memory consumes 4.7MW at 100PB/s which, while well within the total power budget (20MW), is also error-resilient.
  • Keywords
    DRAM chips; buffer storage; error correction; fault tolerant computing; mainframes; parallel machines; power aware computing; system recovery; 3D-stacked memory; 3D-stacking; DRAM memories; DRAM organizations; NEK5000 benchmarks; access energy; bitline lengths; commodity DRAM; cooptimization problem; crippling failure rates; energy-efficient resilient exascale memories; error correction cost; error resilience; exascale supercomputing; fault tolerance; memory chips; memory subsystem; power 20 MW; power 4.7 MW; row buffer resizing; Abstracts; Bandwidth; Error correction codes; Pins; Random access memory; Three-dimensional displays; Through-silicon vias;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
  • Conference_Location
    Denver, CO
  • Print_ISBN
    978-1-4503-2378-9
  • Type

    conf

  • DOI
    10.1145/2503210.2503215
  • Filename
    6877456