DocumentCode
769506
Title
A nonpreemptive real-time scheduler with recovery from transient faults and its implementation
Author
Mossé, Daniel ; Melhem, Rami ; Ghosh, Sunondo
Author_Institution
Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
Volume
29
Issue
8
fYear
2003
Firstpage
752
Lastpage
767
Abstract
Real-time systems (RTS) are those whose correctness depends on satisfying the required functional as well as the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, single event upsets (SEUs) are the prevalent type of faults; SEUs are transient faults and affect a single task at a time. We present a scheme to guarantee that the execution of real-time tasks can tolerate SEUs and intermittent faults assuming any queue-based scheduling technique. Three algorithms are presented to solve the problem of adding fault tolerance to a queue of real-time tasks by reserving sufficient slack in a schedule so that recovery can be carried out before the task deadline without compromising guarantees given to other tasks. The first algorithm is a dynamic programming optimal solution, the second is a linear-time heuristic for scheduling dynamic tasks, and the third algorithm comprises extensions to address queues with gaps between tasks (gaps are caused by precedence, resource, or timing constraints). We show through simulations that the heuristics closely approximate the optimal algorithm. Finally, we describe the implementation of the modified admission control algorithm, non-preemptive scheduler, and recovery mechanism in the FT-RT-Mach operating system.
Keywords
computational complexity; dynamic programming; fault tolerant computing; graph theory; operating systems (computers); queueing theory; real-time systems; scheduling; system recovery; FT-RT-Mach operating system; dynamic programming; dynamic task scheduling; fault tolerance; intermittent fault; layered graph; linear-time heuristic; modified admission control algorithm; nonpreemptive real-time scheduler; optimal algorithm; queue-based scheduling technique; real-time system; real-time task execution; single event upset; transient fault recovery; Admission control; Dynamic programming; Dynamic scheduling; Fault tolerance; Heuristic algorithms; Real time systems; Scheduling algorithm; Single event transient; Single event upset; Timing;
fLanguage
English
Journal_Title
Software Engineering, IEEE Transactions on
Publisher
ieee
ISSN
0098-5589
Type
jour
DOI
10.1109/TSE.2003.1223648
Filename
1223648
Link To Document