DocumentCode :
3103383
Title :
A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact
Author :
Plankensteiner, Kassian ; Prodan, Radu ; Fahringer, Thomas
Author_Institution :
Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
fYear :
2009
fDate :
9-11 Dec. 2009
Firstpage :
313
Lastpage :
320
Abstract :
Even though highly distributed environments such as Clouds and Grids are increasingly used for e-science high performance applications, they still cannot deliver the robustness and reliability needed for widespread acceptance as ubiquitous scientific tools. To overcome this problem, existing systems resort to fault tolerance mechanisms such as task replication and task resubmission. In this paper we propose a new heuristic called resubmission impact to enhance the fault tolerance support for scientific workflows in highly distributed systems. In contrast to related approaches, our method can be used effectively on systems even in the absence of historic failure trace data. Simulated experiments of three real scientific workflows in the Austrian Grid environment show that our algorithm drastically reduces the resource waste compared to conservative task replication and resubmission techniques, while having a comparable execution performance and only a slight decrease in the success probability.
Keywords :
fault tolerant computing; grid computing; natural sciences computing; Austrian Grid environment; distributed environment; distributed system; e-science; fault tolerance heuristic; resubmission impact; scientific workflow; task replication; Application software; Clouds; Computer networks; Computer science; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; Processor scheduling; Robustness; fault tolerance; highly distributed environments; scheduling; scientific workflow;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
e-Science, 2009. e-Science '09. Fifth IEEE International Conference on
Conference_Location :
Oxford
Print_ISBN :
978-0-7695-3877-8
Type :
conf
DOI :
10.1109/e-Science.2009.51
Filename :
5380852
Link To Document :
بازگشت