• DocumentCode
    3103383
  • Title

    A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact

  • Author

    Plankensteiner, Kassian ; Prodan, Radu ; Fahringer, Thomas

  • Author_Institution
    Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
  • fYear
    2009
  • fDate
    9-11 Dec. 2009
  • Firstpage
    313
  • Lastpage
    320
  • Abstract
    Even though highly distributed environments such as Clouds and Grids are increasingly used for e-science high performance applications, they still cannot deliver the robustness and reliability needed for widespread acceptance as ubiquitous scientific tools. To overcome this problem, existing systems resort to fault tolerance mechanisms such as task replication and task resubmission. In this paper we propose a new heuristic called resubmission impact to enhance the fault tolerance support for scientific workflows in highly distributed systems. In contrast to related approaches, our method can be used effectively on systems even in the absence of historic failure trace data. Simulated experiments of three real scientific workflows in the Austrian Grid environment show that our algorithm drastically reduces the resource waste compared to conservative task replication and resubmission techniques, while having a comparable execution performance and only a slight decrease in the success probability.
  • Keywords
    fault tolerant computing; grid computing; natural sciences computing; Austrian Grid environment; distributed environment; distributed system; e-science; fault tolerance heuristic; resubmission impact; scientific workflow; task replication; Application software; Clouds; Computer networks; Computer science; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; Processor scheduling; Robustness; fault tolerance; highly distributed environments; scheduling; scientific workflow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Science, 2009. e-Science '09. Fifth IEEE International Conference on
  • Conference_Location
    Oxford
  • Print_ISBN
    978-0-7695-3877-8
  • Type

    conf

  • DOI
    10.1109/e-Science.2009.51
  • Filename
    5380852