Title :
IBM z990 soft error detection and recovery
Author :
Meaney, Patrick J. ; Swaney, Scott B. ; Sanda, Pia N. ; Spainhower, Lisa
Author_Institution :
Technol. Group, IBM Syst., Poughkeepsie, NY, USA
Abstract :
Soft errors in logic are becoming more significant in the design of computer systems due to increased sensitivities of latches and combinatorial logic and the increased number of transistors on a chip. At the same time, users of computer systems continue to expect higher levels of system reliability. Therefore, the investment in hardware and firmware software mitigation is likely to continue to rise. The IBM eServer z990 system is designed to detect and recover from myriad instances of soft and permanent errors. The error detection and recovery within the z990 processors and the "nest" chips is described with respect to the system level protection against soft errors.
Keywords :
error correction codes; error detection codes; integrated circuit reliability; microprocessor chips; multichip modules; system recovery; IBM eServer z990 system; combinatorial logic; computer systems; error correcting code; latches sensitivities; permanent errors; single event upset; soft error detection; soft error rate; soft error recovery; software mitigation; system level protection; system reliability; z990 processors; CMOS technology; Circuits; Computer errors; Error correction codes; Hardware; Latches; Logic design; Microprogramming; Protection; Reliability; Error-correcting code (ECC); error detection; recovery; reliability, availability, and serviceability (RAS); single-event upset (SEU); soft error rate (SER);
Journal_Title :
Device and Materials Reliability, IEEE Transactions on
DOI :
10.1109/TDMR.2005.859577