Title :
Fault tolerance scheduling in economic grids
Author :
Yin, Yulan ; Zhao, Yanhong ; Dai, Fengna
Author_Institution :
Dept. of Math. & Phys., Anhui Univ. of Sci. & Technol., Huainan, China
Abstract :
Abstract-Computational grids using heterogeneous and geographically distributed resources. The unreliable nature of grid infrastructure make the challenges of managing, scheduling, reliability arise. Effective utilization of computational resources is efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance mechanism. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. A critical aspect for an automatic recovery is the availability of checkpoint files. A strategy to increase the availability of checkpoints is replication. Replica resource selection algorithm is proposed to provide checkpoint replication service. The simulated experimental results demonstrate that, the proposed approach effectively schedule the jobs with fault tolerant way in economic grids.
Keywords :
checkpointing; fault tolerance; grid computing; scheduling; automatic recovery; checkpoint replication service; computational grids; computational resources utilization; distributed resources; economic grids; fault tolerance mechanism; fault tolerance scheduling; replica resource selection algorithm; Economics; Fault tolerance; Fault tolerant systems; Monitoring; Processor scheduling; Resource management; Servers; Checkpoint Replication; Economic Grid; Fault Tolerance; Resource Scheduling;
Conference_Titel :
Computer Science and Service System (CSSS), 2011 International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-9762-1
DOI :
10.1109/CSSS.2011.5974954