Title :
Runtime Fault-Handling for Job-Flow Management in Grid Environments
Author :
Dasgupta, Gargi ; Ezenwoye, Onyeka ; Fong, Liana ; Kalayci, Selim ; Sadjadi, S. Masoud ; Viswanathan, Balaji
Author_Institution :
IBM India Res. Lab., New Delhi
Abstract :
The execution of job flow applications is a reality today in academic and industrial domains. In this paper, we propose an approach to adding self-healing behavior to the execution of job flows without the need to modify the job flow engines or redevelop the job flows themselves. We show the feasibility of our non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job-flow management system, which employs job flow based service orchestration at the upper level, and service choreography at the lower level. The generic proxy is inserted transparently between these two layers so that it can intercept all their interactions. We developed a prototype of our approach in a real Grid environment to show how the proxy facilitates runtime handling for failure recovery.
Keywords :
fault tolerant computing; grid computing; failure recovery; generic proxy; grid environments; job flow based service orchestration; job flow engines; job flow execution; runtime fault-handling; self-healing behavior; service choreography; two-level job-flow management system; Computer architecture; Environmental management; Fault tolerance; Grid computing; Job shop scheduling; Logic; Portals; Processor scheduling; Resource management; Runtime environment; fault-tolerance; generic proxy; job-flow management; job-flows; meta-scheduler;
Conference_Titel :
Autonomic Computing, 2008. ICAC '08. International Conference on
Conference_Location :
Chicago, IL
Print_ISBN :
978-0-7695-3175-5
Electronic_ISBN :
978-0-7695-3175-5
DOI :
10.1109/ICAC.2008.16