DocumentCode :
1946448
Title :
Toward efficient check-pointing and rollback under on-demand SBST in chip multi-processors
Author :
Skitsas, Michael A. ; Nicopoulos, Chrysostomos A. ; Michael, Maria K.
Author_Institution :
KIOS Res. Center, Univ. of Cyprus, Nicosia, Cyprus
fYear :
2015
fDate :
6-8 July 2015
Firstpage :
110
Lastpage :
115
Abstract :
In-field on-line testing techniques have recently been proposed for permanent fault detection caused by wear-out/aging-related defects manifesting during the lifetime of a system. Selective Software-Based Self-Testing (SBST) is one such paradigm focusing primarily on the recently stressed functional units of a multicore system at a sub-core granularity, in an attempt to reduce the application performance penalty caused by periodically testing the entire system. In this work, we complement our O/S-enabled framework DeamonGuard for on-demand (selective) SBST to support fault recovery capabilities. Towards this goal, we propose an efficient check pointing and rollback recovery mechanism which, upon fault detection, can restore the system to the most recently valid correct state and resume the normal operation assuming disabling of the faulty core, thereby leading to a healthy (but degraded) system. The work in this paper concentrates on reducing the number of stored checkpoints required when testing at a sub-core granularity, and minimizing the recovery penalty of such framework. We evaluate and demonstrate the overhead of the proposed recovery mechanism, and our results indicate a practical reduction in the number of stored checkpoints as well as a significant improvement in recovery latency for the cases where the faults are correlated with the stressed units.
Keywords :
automatic test software; fault diagnosis; microprocessor chips; multiprocessing systems; DeamonGuard; O-S-enabled framework; aging-related defects; check-pointing; chip multiprocessors; fault recovery capabilities; in-field online testing techniques; multicore system; on-demand SBST; permanent fault detection; rollback recovery mechanism; selective software-based self-testing; stressed functional units; subcore granularity; wear-out defects; Aging; Built-in self-test; Checkpointing; Fault detection; Hardware; Multicore processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
On-Line Testing Symposium (IOLTS), 2015 IEEE 21st International
Conference_Location :
Halkidiki
Type :
conf
DOI :
10.1109/IOLTS.2015.7229842
Filename :
7229842
Link To Document :
بازگشت