Title :
Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations
Author :
Slayman, Charles W.
Author_Institution :
Sun Microsystems Inc., Santa Clara, CA, USA
Abstract :
As the size of the SRAM cache and DRAM memory grows in servers and workstations, cosmic-ray errors are becoming a major concern for systems designers and end users. Several techniques exist to detect and mitigate the occurrence of cosmic-ray upset, such as error detection, error correction, cache scrubbing, and array interleaving. This paper covers the tradeoffs of these techniques in terms of area, power, and performance penalties versus increased reliability. In most system applications, a combination of several techniques is required to meet the necessary reliability and data-integrity targets.
Keywords :
DRAM chips; SRAM chips; cache storage; error correction codes; error detection codes; failure analysis; fault tolerance; network servers; radiation effects; workstations; DRAM memory; SRAM cache; array interleaving; cache scrubbing; cosmic ray errors; data integrity targets; error correction code; error detection code; fault tolerance; reduction techniques; reliability; soft error rate; terrestrial servers; terrestrial workstations; Blades; Error correction; Error correction codes; File servers; Neutrons; Power system reliability; Random access memory; SRAM chips; Switches; Workstations; Cosmic-ray upset; error correction code (ECC); memory fault tolerance; soft-error rate (SER);
Journal_Title :
Device and Materials Reliability, IEEE Transactions on
DOI :
10.1109/TDMR.2005.856487