Title :
Fault Tolerance Mechanisms for SLA-aware Resource Management
Author :
Hovestadt, Matthias
Author_Institution :
Paderborn Center for Parallel Comput., Paderborn Univ.
Abstract :
Future grid systems will demand for properties like runtime responsibility, predictability, and a guaranteed service quality level. In this context, service level agreements have central importance. Many ongoing research projects already focus on the realization of required mechanisms at grid middleware layer. However, only concentrating on grid middleware is not enough. Also the underlying resource management systems have to provide an increased QoS level, since they provide their resources to grid environments. The EU-funded project HPC4U aims at realizing an SLA-aware resource management system. It allows the grid user to negotiate on SLAs, assuring the adherence with agreed SLAs by means of application-transparent checkpointing, snapshotting, and migration
Keywords :
checkpointing; contracts; grid computing; middleware; quality of service; resource allocation; software fault tolerance; QoS; SLA; application-transparent checkpointing; fault tolerance mechanism; grid middleware layer; grid system; resource management system; service level agreement; service quality level; Business; Checkpointing; Context-aware services; Fault tolerance; Grid computing; Middleware; Parallel processing; Quality of service; Resource management; Runtime;
Conference_Titel :
Parallel and Distributed Systems, 2005. Proceedings. 11th International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
0-7695-2281-5
DOI :
10.1109/ICPADS.2005.155