Author_Institution :
Coll. of Inf. Syst. & Manage., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
As clouds have been deployed widely in various fields, the reliability and availability of clouds become the major concern of cloud service providers and users. Thereby, fault tolerance in clouds receives a great deal of attention in both industry and academia, especially for real-time applications due to their safety critical nature. Large amounts of researches have been conducted to realize fault tolerance in distributed systems, among which fault-tolerant scheduling plays a significant role. However, few researches on the fault-tolerant scheduling study the virtualization and the elasticity, two key features of clouds, sufficiently. To address this issue, this paper presents a fault-tolerant mechanism which extends the primary-backup model to incorporate the features of clouds. Meanwhile, for the first time, we propose an elastic resource provisioning mechanism in the fault-tolerant context to improve the resource utilization. On the basis of the fault-tolerant mechanism and the elastic resource provisioning mechanism, we design novel fault-tolerant elastic scheduling algorithms for real-time tasks in clouds named FESTAL, aiming at achieving both fault tolerance and high resource utilization in clouds. Extensive experiments injecting with random synthetic workloads as well as the workload from the latest version of the Google cloud tracelogs are conducted by CloudSim to compare FESTAL with three baseline algorithms, i.e., Non-M igration-FESTAL (NMFESTAL), Non-Overlapping-FESTAL (NOFESTAL), and Elastic First Fit (EFF). The experimental results demonstrate that FESTAL is able to effectively enhance the performance of virtualized clouds.
Keywords :
cloud computing; real-time systems; safety-critical software; scheduling; software fault tolerance; virtualisation; CloudSim; EFF; Google cloud tracelog; NMFESTAL; NOFESTAL; baseline algorithm; cloud service provider; distributed system; elastic first fit; elastic resource provisioning mechanism; elasticity; fault tolerance; fault-tolerant context; fault-tolerant elastic scheduling algorithm; fault-tolerant mechanism; fault-tolerant scheduling; high resource utilization; nonmigration-FESTAL; nonoverlapping-FESTAL; primary-backup model; random synthetic workload; real-time task; reliability; safety critical nature; virtualization; virtualized cloud; Dynamic scheduling; Fault tolerance; Fault tolerant systems; Real-time systems; Resource management; Scheduling algorithms; Timing; Cloud; cloud; elasticity; fault-tolerant scheduling; primary-backup model;