• DocumentCode
    3722941
  • Title

    Toward a Fault-Tolerance Framework for COTS Many-Core Systems

  • Author

    Peter Munk;Mohammad Shadi Alhakeem;Raphael Lisicki;Helge Parzyjegla;Jan Richling; Hei?

  • Author_Institution
    Corp. Sector Res. &
  • fYear
    2015
  • Firstpage
    167
  • Lastpage
    177
  • Abstract
    Commercial-off-the-shelf (COTS) many-core processors offer the performance needed for computational-intensive safety-critical real-time applications such as autonomous driving. However, these consumer-grade many-core processors are increasingly susceptible to faults because of their highly integrated design. In this paper, we present a fault-tolerance framework that eases the usage of COTS many-core processors for safety-critical applications. Our framework employs an adaptable software-based fault-tolerance mechanism that combines N Modular Redundancy (NMR) with a repair process and a rejuvenating round robin voting scheme. A Stochastic Activity Network (SAN) model of the fault-tolerance mechanism allows the framework to adapt the parameters of the mechanism such that a specified target availability is achieved with minimum overhead. Experiments on a cycle-accurate simulator empirically prove the correctness of the SAN model and evaluate the overhead of the framework.
  • Keywords
    "Fault tolerance","Fault tolerant systems","Maintenance engineering","Program processors","Adaptation models","Nuclear magnetic resonance"
  • Publisher
    ieee
  • Conference_Titel
    Dependable Computing Conference (EDCC), 2015 Eleventh European
  • Type

    conf

  • DOI
    10.1109/EDCC.2015.32
  • Filename
    7371964