Title :
Tolerating transient faults in statically scheduled safety-critical embedded systems
Author :
Kandasamy, Nagarajan ; Hayes, John P. ; Murray, Brian T.
Author_Institution :
Dept. of Electr. Eng., Michigan Univ., Ann Arbor, MI, USA
Abstract :
Static off-line scheduling ensures predictability of worst-case behavior and high resource utilization for safety-critical applications but lacks the flexibility needed to deal with run-time fault-tolerance. We present a temporal redundancy-based recovery technique that tolerates transient task failures in statically scheduled distributed embedded systems where tasks have timing, resource, and precedence constraints. Task failures are handled using precomputed contingency schedules that introduce adaptive fault tolerance into table-driven dispatchers. Failures are masked using the spare capacity on the affected processor and the recovery scheme requires no hardware overhead. Our approach combines the benefits of static scheduling with the run-time flexibility needed for fault tolerance in low-cost embedded systems. We present a method to obtain contingency schedules and prove its correctness. We also evaluate the effectiveness of the proposed method through simulation
Keywords :
fault tolerant computing; safety-critical software; system recovery; redundancy-based recovery; run-time fault-tolerance; safety-critical; statically scheduled; transient task failures; Application software; Automotive engineering; Embedded system; Hardware; Job shop scheduling; Mars; Redundancy; Resource management; Runtime; Timing;
Conference_Titel :
Reliable Distributed Systems, 1999. Proceedings of the 18th IEEE Symposium on
Conference_Location :
Lausanne
Print_ISBN :
0-7695-0290-3
DOI :
10.1109/RELDIS.1999.805097