Title :
Reducing Soft Errors through Operand Width Aware Policies
Author :
Ergin, Oguz ; Unsal, Osman S. ; Vera, Xavier ; Gonzalez, Adriana
Author_Institution :
Dept. of Comput. Eng., TOBB Univ. of Econ. & Technol., Ankara, Turkey
Abstract :
Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are meaningless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved.
Keywords :
error correction; error detection; microprocessor chips; reliability; storage management chips; error avoidance coverage; error correction; error detection; first level data cache; microprocessors; operand width aware policy; particle strike; soft error vulnerability; storage space; transient errors; Memory Structures; Memory structures-reliability; Processor Architectures; Reliability; Testing; and Fault-Tolerance; narrow values.; soft errors; testing and fault tolerance;
Journal_Title :
Dependable and Secure Computing, IEEE Transactions on
DOI :
10.1109/TDSC.2008.18