DocumentCode
1778598
Title
A fast runtime fault recovery approach for NoC-based MPSoCS for performance constrained applications
Author
Wachter, Eduardo ; Erichsen, Augusto ; Juracy, Leonardo ; Amory, Alexandre ; Moraes, Fernando G.
Author_Institution
FACIN, PUCRS, Porto Alegre, Brazil
fYear
2014
fDate
1-5 Sept. 2014
Firstpage
1
Lastpage
7
Abstract
Mechanisms for runtime fault-tolerance in Multi-Processor System-on-Chips (MPSoCs) are mandatory to cope with transient and permanent faults. This issue is even more relevant in nanotechnologies due to process variability, aging effects, and susceptibility to upsets, among other factors. The literature presents isolated solutions to deal with faults in the MPSoC communication infrastructure. In this context, one gap to be fulfilled is to integrate all layers, resulting in a solution to cope with NoC faults from the physical layer up to the application layer. The goal of this work is to present a runtime integrated approach to cope with NoC faults in MPSoCs. The original contribution is the proposal of a set of hardware and software mechanisms to ensure both efficient and reliable communication in NoC-based MPSoCs. The proposal has an acceptable silicon area overhead and a small memory footprint. Experiments demonstrate that benchmarks (synthetic and real MPSoC applications) were simulated with thousands of random fault injections, and all of them were executed correctly. Moreover, the average application execution time overhead is lower than 0.5%. This suggests the proposed fault tolerant method could be used in applications with reliability and performance constraints.
Keywords
fault tolerant computing; integrated circuit reliability; multiprocessor interconnection networks; network-on-chip; NoC-based MPSoC; aging effects; fast runtime fault recovery approach; memory footprint; multiprocessor system-on-chips; nanotechnologies; performance constrained applications; permanent faults; process variability; random fault injections; silicon area overhead; transient faults; Fault tolerance; Fault tolerant systems; Ports (Computers); Program processors; Protocols; Routing; System recovery; NoC-based MPSoC; fault recovery; fault-tolerant NoCs; fault-tolerant communication;
fLanguage
English
Publisher
ieee
Conference_Titel
Integrated Circuits and Systems Design (SBCCI), 2014 27th Symposium on
Conference_Location
Aracaju
Type
conf
DOI
10.1145/2660540.2660986
Filename
6994638
Link To Document