DocumentCode :
2570719
Title :
Graph-Based Task Replication for Workflow Applications
Author :
Sirvent, Raül ; Badia, Rosa M. ; Labarta, Jesús
Author_Institution :
Barcelona Supercomput. Center, Barcelona, Spain
fYear :
2009
fDate :
25-27 June 2009
Firstpage :
20
Lastpage :
28
Abstract :
The Grid is an heterogeneous and dynamic environment which enables distributed computation. This makes it a technology prone to failures. Some related work uses replication to overcome failures in a set of independent tasks, and in workflow applications, but they do not consider possible resource limitations when scheduling the replicas. In this paper, we focus on the use of task replication techniques for workflow applications, trying to achieve not only tolerance to the possible failures in an execution, but also to speed up the computation without demanding the user to implement an application-level checkpoint, which may be a difficult task depending on the application. Moreover, we also study what to do when there are not enough resources for replicating all running tasks. We establish different priorities of replication depending on the graph of the workflow application, giving more priority to tasks with a higher output degree. We have implemented our proposed policy in the GRID superscalar system, and we have run the fastDNAml as an experiment to prove our objectives are reached. Finally, we have identified and studied a problem which may arise due to the use of replication in workflow applications: the replication wait time.
Keywords :
checkpointing; graph theory; grid computing; software fault tolerance; workflow management software; application-level checkpoint; distributed computation; failure tolerance; fastDNAml; graph-based task replication; grid superscalar system; replication wait time; workflow applications; Computer architecture; Distributed computing; Fault tolerance; Fault tolerant systems; Grid computing; High performance computing; Processor scheduling; Proposals; Grid computing; fault tolerance; task replication; workflow scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-4600-1
Electronic_ISBN :
978-0-7695-3738-2
Type :
conf
DOI :
10.1109/HPCC.2009.29
Filename :
5166972
Link To Document :
بازگشت