DocumentCode :
3722941
Title :
Toward a Fault-Tolerance Framework for COTS Many-Core Systems
Author :
Peter Munk;Mohammad Shadi Alhakeem;Raphael Lisicki;Helge Parzyjegla;Jan Richling; Hei?
Author_Institution :
Corp. Sector Res. &
fYear :
2015
Firstpage :
167
Lastpage :
177
Abstract :
Commercial-off-the-shelf (COTS) many-core processors offer the performance needed for computational-intensive safety-critical real-time applications such as autonomous driving. However, these consumer-grade many-core processors are increasingly susceptible to faults because of their highly integrated design. In this paper, we present a fault-tolerance framework that eases the usage of COTS many-core processors for safety-critical applications. Our framework employs an adaptable software-based fault-tolerance mechanism that combines N Modular Redundancy (NMR) with a repair process and a rejuvenating round robin voting scheme. A Stochastic Activity Network (SAN) model of the fault-tolerance mechanism allows the framework to adapt the parameters of the mechanism such that a specified target availability is achieved with minimum overhead. Experiments on a cycle-accurate simulator empirically prove the correctness of the SAN model and evaluate the overhead of the framework.
Keywords :
"Fault tolerance","Fault tolerant systems","Maintenance engineering","Program processors","Adaptation models","Nuclear magnetic resonance"
Publisher :
ieee
Conference_Titel :
Dependable Computing Conference (EDCC), 2015 Eleventh European
Type :
conf
DOI :
10.1109/EDCC.2015.32
Filename :
7371964
Link To Document :
بازگشت