DocumentCode :
2066319
Title :
Efficient Transient-Fault Tolerance for Multithreaded Processors Using Dual-Thread Execution
Author :
Ma, Yi ; Zhou, Huiyang
Author_Institution :
Univ. of Central Florida, Orlando
fYear :
2007
fDate :
1-4 Oct. 2007
Firstpage :
120
Lastpage :
126
Abstract :
Reliability becomes a key issue in computer system design as microprocessors are increasingly susceptible to transient faults. Many previously proposed schemes exploit simultaneous multithreaded (SMT) architectures to achieve transient-fault tolerance by running a program concurrently on two threads, a main thread and a redundant checker thread. Such schemes however often incur high performance overheads due to resource contention and redundancy checking. In this paper, we propose dual-thread execution (DTE) for SMT processors to efficiently achieve transient-fault tolerance. DTE is derived from the recently proposed fault-tolerant dual-core execution (FTDCE) paradigm, in which two processor cores on a single chip perform redundant execution to improve both reliability and performance. In this paper, we apply the same principles as in FTDCE to SMT architectures and explore fetch policies to address the critical resource-sharing issue in SMT architectures. Our experimental results show that DTE achieves an average of 56.1% speedup over the previously proposed simultaneously and redundantly threaded processor with recovery (SRTR). More impressively, even compared to single-thread execution, DTE achieves full-coverage transient-fault tolerance along with an average of 15.5% performance improvement.
Keywords :
checkpointing; computer architecture; fault tolerant computing; multi-threading; resource allocation; computer system design; dual-thread execution; redundant checker thread; resource-sharing issue; simultaneous multithreaded architecture; transient-fault tolerance; Computer architecture; Computer science; Fault detection; Fault tolerance; Microprocessors; Protection; Redundancy; Surface-mount technology; Voltage; Yarn; Fault tolerance; microprocessors; multi-threaded architectures; redundant systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Design, 2006. ICCD 2006. International Conference on
Conference_Location :
San Jose, CA
ISSN :
1063-6404
Print_ISBN :
978-0-7803-9707-1
Electronic_ISBN :
1063-6404
Type :
conf
DOI :
10.1109/ICCD.2006.4380804
Filename :
4380804
Link To Document :
بازگشت