DocumentCode
688408
Title
Device View Redundancy: An Adaptive Low-Overhead Fault Tolerance Mechanism for Many-Core System
Author
Wentao Jia ; Chunyuan Zhang ; Jian Fu
Author_Institution
Nat. Key Lab. of Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
fYear
2013
fDate
13-15 Nov. 2013
Firstpage
2080
Lastpage
2087
Abstract
Continued increasing of fault rate in integrate circuit makes processors more susceptible to errors, especially many-core processor. Meanwhile, most systems or applications do not need full fault coverage, which has excessive overhead. So on-demand fault tolerance is desired for these applications. In this paper, we propose an adaptive low-overhead fault tolerance mechanism for many-core system, called Device View Redundancy(DVR). It treats fault tolerance as a device that can be configured and used by application when high reliability is needed. Nevertheless, DVR exploits the idle resources for low overhead fault tolerance, which is based on the observation that the utilization of many-core system is low due to lack of parallelism in application. Finally, the experiment shows that the performance overhead of DVR is reduced by 16% to 98% compared with full Dual Modular Redundancy(DMR).
Keywords
fault tolerant computing; multiprocessing systems; DVR; adaptive low-overhead fault tolerance mechanism; device view redundancy; many-core processor; many-core system; on-demand fault tolerance; Fault tolerant systems; Hardware; Performance evaluation; Redundancy; Registers; dynamic coupling; idle resource exploitation; low-overhead; many core system; on-demand redundancy;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on
Conference_Location
Zhangjiajie
Type
conf
DOI
10.1109/HPCC.and.EUC.2013.299
Filename
6832182
Link To Document