Title :
A context saving fault tolerant approach for a shared memory many-core architecture
Author :
Wachter, Eduardo ; Ventroux, Nicolas ; Moraes, Fernando G.
Author_Institution :
FACIN, PUCRS, Porto Alegre, Brazil
Abstract :
Mechanisms for runtime fault-tolerance in many-core architectures are mandatory to cope with transient and permanent faults. This issue is even more relevant with aggressive technology nodes due to process variability, aging effects, and susceptibility to upsets, among other factors. This work proposes to save periodically the context and to re-schedule tasks to the last reliable known state and avoid the faulty processor. This technique is implemented on an embedded multicore architecture named P2012. The proposed fault-tolerant approach induces a limited overhead of 9.37% in an industrial image processing application while guaranteeing a full-error recovery if any error is detected.
Keywords :
embedded systems; fault tolerance; multiprocessing systems; system recovery; system-on-chip; P2012 embedded multicore architecture; aging effects; context saving fault tolerant approach; full-error recovery; industrial image processing; permanent faults; process variability; reschedule tasks; runtime fault-tolerance; shared memory many-core architecture; transient faults; upset susceptibility; Computer architecture; Context; Fault tolerance; Fault tolerant systems; Hardware; Software; Synchronization; NoC-based MPSoC; checkpointing; context saving; fault recovery; rollback;
Conference_Titel :
Circuits and Systems (ISCAS), 2015 IEEE International Symposium on
Conference_Location :
Lisbon
DOI :
10.1109/ISCAS.2015.7168947