Title :
REE: Exploiting idempotent property of applications for fault detection and recovery
Author :
Jianli Li ; Qingping Tan ; Lanfang Tan ; Tongchuan Xin
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
As semiconductor technologies scale down to deep sub-micron dimensions, transient faults will soon become a critical reliability concern. This paper presents the Reliability Enhancement Exploiting (REE) technique, a software-implemented fault tolerance solution which employs idempotent property of applications. An idempotent region of code is simply one that can be re-executed multiple times and still produces the same, correct result. By instrumenting extra instructions in an idempotent region to re-execute the region, REE can detect the transient faults occurring during the execution of the idempotent region. Once a fault is detected, REE can recover from the fault by executing the idempotent region again. To the best of our knowledge, this is the first to exploit idempotent property for fault detection. With similar fault coverage to a classic solution, the memory overhead and the performance overhead have been reduced by 71.8% and 31.3%, respectively.
Keywords :
fault diagnosis; fault tolerance; integrated circuit reliability; semiconductor technology; REE; fault coverage; fault detection; fault recovery; fault tolerance; idempotent region; memory overhead; performance overhead; reliability enhancement exploiting technique; semiconductor technology; Circuit faults; Fault tolerance; Fault tolerant systems; Hardware; Program processors; Transient analysis; Fault tolerance; Idempotent property; Transient faults;
Conference_Titel :
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location :
Shenyang
DOI :
10.1109/ICNC.2013.6818241