Title :
Comparing Fault Tolerance Mechanisms for Self-Organizing Resource Management in Grids
Author_Institution :
Univ. of Paderborn, Paderborn
Abstract :
Grid users require the established usage of service level agreements (SLAs). To prevent SLA violations in the case of failures, current research focuses on the development of fault-tolerance (FT-) mechanisms like migration. My new approach integrates risk assessment into the Grid fabric in order to estimate the risk for resource failures. In systems with high workload the initiation of a FT-mechanism causes effects for other jobs. In order to find the most profitable solution, the different effects have to be estimated and compared. This paper presents an automatic process for the comparison and selection of a FT-mechanism in a risk- aware, self-organizing resource management system which takes into account different resource stabilities.
Keywords :
fault tolerant computing; grid computing; risk management; self-adjusting systems; Grids; fault tolerance; risk assessment; self-organizing resource management; service level agreements; Checkpointing; Condition monitoring; Fabrics; Fault tolerance; Outsourcing; Parallel processing; Quality of service; Resource management; Risk management; Stability;
Conference_Titel :
Semantics, Knowledge and Grid, Third International Conference on
Conference_Location :
Shan Xi
Print_ISBN :
0-7695-3007-9
Electronic_ISBN :
978-0-7695-3007-9
DOI :
10.1109/SKG.2007.40