Title :
Recursive Evaluation of Fault Tolerance Mechanisms for SLA Management
Author_Institution :
Univ. of Paderborn, Paderborn
Abstract :
Service level agreements (SLAs) have been introduced into the grid in order to build a basis for its commercial uptake. The challenge for Grid providers in agreeing and operating SLA-bound jobs is to ensure their fulfillment even in the case of failures. Hence, fault-tolerance mechanisms are an essential means of the provider´s SLA management. The high utilization of commercial operated clusters leads to scenarios in which typically a job migration effects other jobs scheduled. The effects result from the unavailability of enough free resources which would be needed to catch all resource outages. Consequently before initiating a migration, its effects for other jobs have to be compared and the initiation of fault- tolerance (FT-) mechanisms have to be evaluated recursively. This paper presents a measurement for the benefit of initiating a FT-mechanism, the recursive evaluation, and termination condition. Performing such an impact evaluation of an initiated chain of FT-mechanisms is often more profitable than performing a single FT-mechanism and accordingly this is important for the Grid commercialization.
Keywords :
fault tolerance; quality of service; telecommunication network management; SLA management; commercial operated clusters; fault tolerance; grid commercialization; grid providers; job migration; recursive evaluation; resource outages; service level agreements; termination condition; Availability; Business; Commercialization; Computer network management; Conference management; Fault tolerance; Parallel processing; Performance evaluation; Quality of service; Resumes; Fault tolerance; RMS; Risk Management; SLA;
Conference_Titel :
Networking and Services, 2008. ICNS 2008. Fourth International Conference on
Conference_Location :
Gosier
Print_ISBN :
978-0-7695-3094-9
DOI :
10.1109/ICNS.2008.22