DocumentCode :
2241051
Title :
SAFER: Stuck-At-Fault Error Recovery for Memories
Author :
Seong, Nak Hee ; Woo, Dong Hyuk ; Srinivasan, Vijayalakshmi ; Rivers, Jude A. ; Lee, Hsien-Hsin S.
Author_Institution :
Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
2010
fDate :
4-8 Dec. 2010
Firstpage :
115
Lastpage :
124
Abstract :
As technology scaling poses a threat to DRAM scaling due to physical limitations such as limited charge, alternative memory technologies including several emerging non-volatile memories are being explored as possible DRAM replacements. One main roadblock for wider adoption of these new memories is the limited write endurance, which leads to wear-out related permanent failures. Furthermore, technology scaling increases the variation in cell lifetime resulting in early failures of many cells. Existing error correcting techniques are primarily devised for recovering from transient faults and are not suitable for recovering from permanent stuck-at faults, which tend to increase gradually with repeated write cycles. In this paper, we propose SAFER, a novel hardware-efficient multi-bit stuck-at fault error recovery scheme for resistive memories, which can function in conjunction with existing wear-leveling techniques. SAFER exploits the key attribute that a failed cell with a stuck-at value is still readable, making it possible to continue to use the failed cell to store data, thereby reducing the hardware overhead for error recovery. SAFER partitions a data block dynamically while ensuring that there is at most one fail bit per partition and uses single error correction techniques per partition for fail recovery. SAFER increases the number of recoverable fails and achieves better lifetime improvement with smaller hardware overhead relative to recently proposed Error Correcting Pointers and even ideal hamming coding scheme.
Keywords :
DRAM chips; Hamming codes; error correction; error correction codes; fault diagnosis; phase change memories; system recovery; DRAM scaling; error correcting pointer; hamming coding; multibit error correction; resistive memory; stuck at fault error recovery; multi-bit error correction; phase-change memory; reliability; resistive memory; stuck-at fault recovery; write endurance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on
Conference_Location :
Atlanta, GA
ISSN :
1072-4451
Print_ISBN :
978-1-4244-9071-4
Type :
conf
DOI :
10.1109/MICRO.2010.46
Filename :
5695530
Link To Document :
بازگشت