Title :
R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems
Author :
Junsung Kim ; Lakshmanan, K. ; Rajkumar, R.
Author_Institution :
Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
fDate :
June 29 2010-July 1 2010
Abstract :
Many emerging embedded real-time applications such as SCADA (Supervisory Control and Data Acquisition), autonomous vehicles and advanced avionics, require a high degree of dependability. Dealing with tasks having both hard real-time requirements and high reliability constraints is a key challenge faced in such systems. This paper addresses the problem of guaranteeing reliability requirements with bounded recovery times on fail-stop processors in fault-tolerant multiprocessor real-time systems. We classify tasks based on their recovery-time requirements into (i) Hard Recovery, (ii) Soft Recovery, and (iii) Best-Effort Recovery tasks. Then, the notion of a Hot Standby for Hard Recovery tasks along with a Cold Standby for Soft Recovery and Best-Effort Recovery tasks is introduced. In order to maximize the benefits of using a Hot Standby, replicas should not be co-located on the same processor. For this purpose, we propose a task allocation algorithm for Hot Standby replicas called R-BFD (Reliable Best-Fit Decreasing) that uses 37% fewer number of processors than BFD-P (Best-Fit Decreasing augmented with placement constraints). For tasks with more relaxed recovery-time constraints, however, additional optimization can be applied by using a Cold Standby that gets activated only when failures occur. Given a system reliability requirement and hence a maximum number of processor failures to tolerate, the required resource overprovisioning for Cold Standby replicas from multiple processors can be consolidated. An algorithm called R-BATCH (Reliable Bin-packing Algorithm for Tasks with Cold standby and Hot standby) reduces the required number of processors by up to 45% compared to R-BFD-based pure Hot Standby replication technique.
Keywords :
SCADA systems; bin packing; embedded systems; fault tolerant computing; optimisation; processor scheduling; real-time systems; R-BATCH; SCADA; autonomous vehicles; best-effort recovery; bounded recovery times; fail-stop processors; fault tolerant multiprocessor real-time system; hard recovery; high reliability constraint; hot standby replicas; optimization; processor failures; reliable best-fit decreasing; reliable bin packing algorithm; soft recovery; task allocation algorithm; task partitioning; Fault tolerance; Fault tolerant systems; Processor scheduling; Program processors; Real time systems; Resource management; bin-packing algorithms; embedded systems; fault tolerance; real-time systems; task replication;
Conference_Titel :
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
Conference_Location :
Bradford
Print_ISBN :
978-1-4244-7547-6
DOI :
10.1109/CIT.2010.321