Title :
A nonpreemptive real-time scheduler with recovery from transient faults and its implementation
Author :
Mossé, Daniel ; Melhem, Rami ; Ghosh, Sunondo
Author_Institution :
Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
Abstract :
Real-time systems (RTS) are those whose correctness depends on satisfying the required functional as well as the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, single event upsets (SEUs) are the prevalent type of faults; SEUs are transient faults and affect a single task at a time. We present a scheme to guarantee that the execution of real-time tasks can tolerate SEUs and intermittent faults assuming any queue-based scheduling technique. Three algorithms are presented to solve the problem of adding fault tolerance to a queue of real-time tasks by reserving sufficient slack in a schedule so that recovery can be carried out before the task deadline without compromising guarantees given to other tasks. The first algorithm is a dynamic programming optimal solution, the second is a linear-time heuristic for scheduling dynamic tasks, and the third algorithm comprises extensions to address queues with gaps between tasks (gaps are caused by precedence, resource, or timing constraints). We show through simulations that the heuristics closely approximate the optimal algorithm. Finally, we describe the implementation of the modified admission control algorithm, non-preemptive scheduler, and recovery mechanism in the FT-RT-Mach operating system.
Keywords :
computational complexity; dynamic programming; fault tolerant computing; graph theory; operating systems (computers); queueing theory; real-time systems; scheduling; system recovery; FT-RT-Mach operating system; dynamic programming; dynamic task scheduling; fault tolerance; intermittent fault; layered graph; linear-time heuristic; modified admission control algorithm; nonpreemptive real-time scheduler; optimal algorithm; queue-based scheduling technique; real-time system; real-time task execution; single event upset; transient fault recovery; Admission control; Dynamic programming; Dynamic scheduling; Fault tolerance; Heuristic algorithms; Real time systems; Scheduling algorithm; Single event transient; Single event upset; Timing;
Journal_Title :
Software Engineering, IEEE Transactions on
DOI :
10.1109/TSE.2003.1223648