Title :
Efficient software-based fault tolerance approach on multicore platforms
Author :
Mushtaq, Hamid ; Al-Ars, Zaid ; Bertels, Koen
Author_Institution :
Computer Engineering Laboratory, Delft University of Technology, Netherlands
Abstract :
This paper describes a low overhead software-based fault tolerance approach for shared memory multicore systems. The scheme is implemented at user-space level and requires almost no changes to the original application. Redundant multithreaded processes are used to detect soft errors and recover from them. Our scheme makes sure that the execution of the redundant processes is identical even in the presence of non-determinism due to shared memory accesses. It provides a very low overhead mechanism to achieve this. Moreover it implements a fast error detection and recovery mechanism. The overhead incurred by our approach ranges from 0% to 18% for selected benchmarks. This is lower than comparable systems published in literature.
Keywords :
Benchmark testing; Clocks; Fault tolerance; Fault tolerant systems; Instruction sets; Libraries; Synchronization;
Conference_Titel :
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013
Conference_Location :
Grenoble, France
Print_ISBN :
978-1-4673-5071-6
DOI :
10.7873/DATE.2013.194