Title :
Efficient Transient-Fault Tolerance for Multithreaded Processors Using Dual-Thread Execution
Author :
Ma, Yi ; Zhou, Huiyang
Author_Institution :
Univ. of Central Florida, Orlando
Abstract :
Reliability becomes a key issue in computer system design as microprocessors are increasingly susceptible to transient faults. Many previously proposed schemes exploit simultaneous multithreaded (SMT) architectures to achieve transient-fault tolerance by running a program concurrently on two threads, a main thread and a redundant checker thread. Such schemes however often incur high performance overheads due to resource contention and redundancy checking. In this paper, we propose dual-thread execution (DTE) for SMT processors to efficiently achieve transient-fault tolerance. DTE is derived from the recently proposed fault-tolerant dual-core execution (FTDCE) paradigm, in which two processor cores on a single chip perform redundant execution to improve both reliability and performance. In this paper, we apply the same principles as in FTDCE to SMT architectures and explore fetch policies to address the critical resource-sharing issue in SMT architectures. Our experimental results show that DTE achieves an average of 56.1% speedup over the previously proposed simultaneously and redundantly threaded processor with recovery (SRTR). More impressively, even compared to single-thread execution, DTE achieves full-coverage transient-fault tolerance along with an average of 15.5% performance improvement.
Keywords :
checkpointing; computer architecture; fault tolerant computing; multi-threading; resource allocation; computer system design; dual-thread execution; redundant checker thread; resource-sharing issue; simultaneous multithreaded architecture; transient-fault tolerance; Computer architecture; Computer science; Fault detection; Fault tolerance; Microprocessors; Protection; Redundancy; Surface-mount technology; Voltage; Yarn; Fault tolerance; microprocessors; multi-threaded architectures; redundant systems;
Conference_Titel :
Computer Design, 2006. ICCD 2006. International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
978-0-7803-9707-1
Electronic_ISBN :
1063-6404
DOI :
10.1109/ICCD.2006.4380804