DocumentCode
3722941
Title
Toward a Fault-Tolerance Framework for COTS Many-Core Systems
Author
Peter Munk;Mohammad Shadi Alhakeem;Raphael Lisicki;Helge Parzyjegla;Jan Richling; Hei?
Author_Institution
Corp. Sector Res. &
fYear
2015
Firstpage
167
Lastpage
177
Abstract
Commercial-off-the-shelf (COTS) many-core processors offer the performance needed for computational-intensive safety-critical real-time applications such as autonomous driving. However, these consumer-grade many-core processors are increasingly susceptible to faults because of their highly integrated design. In this paper, we present a fault-tolerance framework that eases the usage of COTS many-core processors for safety-critical applications. Our framework employs an adaptable software-based fault-tolerance mechanism that combines N Modular Redundancy (NMR) with a repair process and a rejuvenating round robin voting scheme. A Stochastic Activity Network (SAN) model of the fault-tolerance mechanism allows the framework to adapt the parameters of the mechanism such that a specified target availability is achieved with minimum overhead. Experiments on a cycle-accurate simulator empirically prove the correctness of the SAN model and evaluate the overhead of the framework.
Keywords
"Fault tolerance","Fault tolerant systems","Maintenance engineering","Program processors","Adaptation models","Nuclear magnetic resonance"
Publisher
ieee
Conference_Titel
Dependable Computing Conference (EDCC), 2015 Eleventh European
Type
conf
DOI
10.1109/EDCC.2015.32
Filename
7371964
Link To Document