DocumentCode :
1799868
Title :
Citadel: Efficiently Protecting Stacked Memory from Large Granularity Failures
Author :
Nair, Prashant J. ; Roberts, David A. ; Qureshi, Moinuddin K.
fYear :
2014
fDate :
13-17 Dec. 2014
Firstpage :
51
Lastpage :
62
Abstract :
Stacked memory modules are likely to be tightly integrated with the processor. It is vital that these memory modules operate reliably, as memory failure can require the replacement of the entire socket. To make matters worse, stacked memory designs are susceptible to newer failure modes (for example, due to faulty through-silicon vias, or TSVs) that can cause large portions of memory, such as a bank, to become faulty. To avoid data loss from large-granularity failures, the memory system may use symbol-based codes that stripe the data for a cache line across several banks (or channels). Unfortunately, such data-striping reduces memory level parallelism causing significant slowdown and higher power consumption. This paper proposes Citadel, a robust memory architecture that allows the memory system to retain each cache line within one bank, thus allowing high performance, lower power and efficiently protects the stacked memory from large-granularity failures. Citadel consists of three components, TSV-Swap, which can tolerate both faulty data-TSVs and faulty address-TSVs, Tri Dimensional Parity (3DP), which can tolerate column failures, row failures, and bank failures, and Dynamic Dual Granularity Sparing (DDS), which can mitigate permanent faults by dynamically sparing faulty memory regions either at a row granularity or at a bank granularity. Our evaluations with real-world data for DRAM failures show that Citadel provides performance and power similar to maintaining the entire cache line in the same bank, and yet provides 700x higher reliability than Chip Kill-like ECC codes.
Keywords :
DRAM chips; cache storage; parallel memories; 3DP; Chip Kill-like ECC codes; Citadel; DDS; DRAM failures; TSV-Swap; bank failures; bank granularity; cache line; column failure; data-striping; dynamic dual granularity sparing; faulty address-TSV; faulty data-TSV; large-granularity failures; memory architecture; memory failure; memory level parallelism; row failures; row granularity; stacked memory design; stacked memory modules; stacked memory protection; symbol-based codes; through-silicon via; tri dimensional parity; Circuit faults; DRAM chips; Error correction codes; Reliability; Three-dimensional displays; Through-silicon vias; DRAM; Error Correcting Code; Faults; Resilience; Stacked Memory; Through Silicon Vias;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
ISSN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2014.57
Filename :
7011377
Link To Document :
بازگشت