Title :
Grid workflow: a flexible failure handling framework for the grid
Author :
Hwang, Soonwook ; Kesselman, Carl
Author_Institution :
Inf. Sci. Inst., Southern California Univ., Marina del Rey, CA, USA
Abstract :
The generic, heterogeneous, and dynamic nature of the grid requires a new from of failure recovery mechanism to address its unique requirements such as support for diverse failure handling strategies, separation of failure handling strategies from application codes, and user-defined exception handling. We here propose a grid workflow system (grid-WFS), a flexible failure handling framework for the grid, which addresses these grid-unique failure recovery requirements. Central to the framework is flexibility by the use of workflow structure as a high-level recovery policy specification. We show how this use of high-level workflow structure allows users to achieve failure recovery in a variety of ways depending on the requirements and constraints of their applications. We also demonstrate that this use of workflow structure enables users to not only rapidly prototype and investigate failure handling strategies, but also easily change them by simply modifying the encompassing workflow structure, while the application code remains intact. Finally, we present an experimental evaluation of our framework using a simulation, demonstrating the value of supporting multiple failure recovery techniques in grid systems to achieve high performance in the presence of failures.
Keywords :
distributed processing; error handling; fault tolerant computing; grid computing; system recovery; failure handling framework; failure handling strategies; failure recovery mechanism; grid workflow system; grid-WFS; high-level recovery policy specification; high-level workflow structure; multiple failure recovery techniques; Application software; Computational modeling; Data security; Grid computing; High performance computing; Internet; Personal communication networks; Prototypes; Supercomputers; Workstations;
Conference_Titel :
High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on
Print_ISBN :
0-7695-1965-2
DOI :
10.1109/HPDC.2003.1210023