DocumentCode
2047646
Title
RFOH: A New Fault Tolerant Job Scheduler in Grid Computing
Author
Khanli, Leili Mohammad ; Far, Maryam Etminan ; Rahmani, Amir Masoud
Author_Institution
Dept. of Comput. Sci., Tabriz Univ., Tabriz, Iran
Volume
1
fYear
2010
fDate
19-21 March 2010
Firstpage
422
Lastpage
425
Abstract
The goal of grid computing is to aggregate the power of widely distributed resources. Considering that the probability of failure is great in such systems, fault tolerance has become a crucial area in computational grid. In this paper, we propose a new strategy named RFOH for fault tolerant job scheduling in computational grid. This strategy maintains the history of fault occurrence of resources in Grid Information Server (GIS). Whenever a resource broker has jobs to schedule, it uses this information in Genetic Algorithm and finds a near optimal solution for the problem. Further, it increases the percentage of jobs executed within specified deadline. The experimental result shows that we can have a combination of user satisfaction and reliability. Using checkpoint techniques, the proposed strategy can make grid scheduling more reliable and efficient.
Keywords
checkpointing; fault tolerance; genetic algorithms; grid computing; job shop scheduling; Grid scheduling; RFOH; checkpoint techniques; computational Grid; distributed resources; fault tolerant job scheduler; genetic algorithm; grid computing; grid information server; Application software; Biological cells; Distributed computing; Fault tolerance; Genetic algorithms; Grid computing; History; Power engineering and energy; Power engineering computing; Processor scheduling;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Engineering and Applications (ICCEA), 2010 Second International Conference on
Conference_Location
Bali Island
Print_ISBN
978-1-4244-6079-3
Electronic_ISBN
978-1-4244-6080-9
Type
conf
DOI
10.1109/ICCEA.2010.88
Filename
5445793
Link To Document