Title :
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
Author :
Wang, Cheng ; Kim, Ho-seop ; Wu, Youfeng ; Ying, Victor
Author_Institution :
Programming Syst. Lab., Intel Corp., Santa Clara, CA
Abstract :
As transistors become increasingly smaller and faster with tighter noise margins, modern processors are becoming increasingly more susceptible to transient hardware faults. Existing hardware-based redundant multi-threading (HRMT) approaches rely mostly on special-purpose hardware to replicate the program into redundant execution threads and compare their computation results. In this paper, we present a software-based redundant multi-threading (SRMT) approach for transient fault detection. Our SRMT technique uses compiler to automatically generate redundant threads so they can run on general-purpose chip multi-processors (CMPs). We exploit high-level program information available at compile time to optimize data communication between redundant threads. Furthermore, our software-based technique provides flexible program execution environment where the legacy binary codes and the reliability-enhanced codes can co-exist in a mix-and-match fashion, depending on the desired level of reliability and software compatibility. Our experimental results show that compiler analysis and optimization techniques can reduce data communication requirement by up to 88% of HRMT. With general-purpose intra-chip communication mechanisms in CMP machine, SRMT overhead can be as low as 19%. Moreover, SRMT technique achieves error coverage rates of 99.98% and 99.6% for SPEC CPU2000 integer and floating-point benchmarks, respectively. These results demonstrate the competitiveness of SRMT to HRMT approaches
Keywords :
microprocessor chips; multi-threading; program compilers; program diagnostics; SPEC CPU2000 integer; chip multiprocessors; compiler analysis; compiler-managed software-based redundant multithreading; floating-point benchmarks; hardware-based redundant multithreading; intra-chip communication mechanisms; legacy binary codes; program execution; software-based redundant multi-threading; transient fault detection; Application software; Binary codes; Data communication; Fault detection; Hardware; Libraries; Multithreading; Optimizing compilers; Program processors; Yarn;
Conference_Titel :
Code Generation and Optimization, 2007. CGO '07. International Symposium on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-2764-7