DocumentCode :
3182432
Title :
Incorporating fault tolerance in GA-based scheduling in grid environment
Author :
Upadhyay, Neeraj ; Misra, Manoj
Author_Institution :
Electron. & Comput. Eng. Dept., Indian Inst. of Technol., Roorkee, India
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
772
Lastpage :
777
Abstract :
Grid systems differ from traditional distributed systems in terms of their large scale, heterogeneity and dynamism. These factors contribute towards higher frequency of fault occurrences; large scale causes lower values of Mean Time To Failure (MTTF), heterogeneity results in interaction faults (protocol mismatches) between communicating dissimilar nodes and dynamism with dynamically varying resource availability due to resources autonomously entering and leaving the grid effects execution of jobs. Another factor that increases probability of failure of applications is that applications running on grid are long running computations taking days to finish. Incorporating fault tolerance in scheduling algorithms is one of the approaches for handling faults in grid environment. Genetic Algorithms are a popular class of meta-heuristic algorithms used for grid scheduling. These are stochastic search algorithms based on the natural process of fitness based selection and reproduction. This paper combines GA-based scheduling with fault tolerance techniques such as checkpointing (dynamic) by modifying the fitness function. Also certain scenarios such as checkpointing without migration for resources with different downtimes and autonomous nature of grid resource providers are considered in building fitness functions. The motivation behind the work is that scheduling-assisted fault tolerance would help in finding the appropriate schedule for the jobs which would complete in the minimum time possible even when resources are prone to failures and thus help in meeting job deadlines. Simulation results for the proposed techniques are presented with respect to makespan and flowtime and fitness value of the resultant schedule obtained. The results show improvement in makespan and flowtime of the adaptive checkpointing approaches over static checkpointing approach. Also the approach which takes into consideration the last failure times of resources perform better than the approach bas- d only on the mean failure times of resources.
Keywords :
checkpointing; genetic algorithms; grid computing; scheduling; software fault tolerance; GA based scheduling; distributed systems; fault tolerance; genetic algorithms; grid environment; grid resource providers; grid systems; interaction faults; mean time to failure; static checkpointing approach; Biological cells; Checkpointing; Fault tolerance; Fault tolerant systems; Genetic algorithms; Scheduling; Fault Tolerance; Flowtime; Genetic Algorithm (GA); Grid; Makespan;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies (WICT), 2011 World Congress on
Conference_Location :
Mumbai
Print_ISBN :
978-1-4673-0127-5
Type :
conf
DOI :
10.1109/WICT.2011.6141344
Filename :
6141344
Link To Document :
بازگشت