Title :
Verification for fault tolerance of the IBM system z microprocessor
Author :
Thompto, Brian W. ; Hoppe, Bodo
Author_Institution :
Syst. & Technol. Group, IBM, Austin, TX, USA
Abstract :
IBM System z∗ processors are known for their industry leading Reliability, Availability and Serviceability (RAS). The hardware is designed to support a high resilience against errors and the ability to recover from errors maintaining a valid architectural state. This paper describes the thorough verification effort required to prove that the fault tolerance of the IBM System z processor core matches the high expectations prior to design tape-out. This paper proposes a multifaceted verification methodology to cover the various aspects of verifying correct error detection, isolation and recovery. Soft errors enlarge the state space of a design significantly. This provides a significant challenge to the functional verification environment in order to tolerate the fails and to expect architectural compliance. Several fault injection mechanisms are discussed. A special focus is on the novel methodology of Comprehensive Fault Injection (CFI) used to validate and improve the dependability characteristics of the processor core, providing improved Soft Error Resilience (SER). Feedback of the results and measurements of the efficiency and functional coverage are an integral part of the overall methodology, allowing the smart use of the available compute resources.
Keywords :
Decision support systems; Fault tolerant systems; Microprocessors; CFI; RAS; SER; error detection; error recovery; fault injection;
Conference_Titel :
Design Automation Conference (DAC), 2010 47th ACM/IEEE
Conference_Location :
Anaheim, CA, USA
Print_ISBN :
978-1-4244-6677-1