• DocumentCode
    3664205
  • Title

    Scheduling Computational Workflows on Failure-Prone Platforms

  • Author

    Guillaume Aupy;Anne Benoit;Henri Casanova;Yves Robert

  • Author_Institution
    Ecole Normale Super. de Lyon, Lyon, France
  • fYear
    2015
  • fDate
    5/1/2015 12:00:00 AM
  • Firstpage
    473
  • Lastpage
    482
  • Abstract
    We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last check pointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and whether to checkpoint or not checkpoint a task after it completes. We give a polynomial-time algorithm for fork graphs and show that the problem is NP-complete with join graphs. Our main result is a polynomial-time algorithm to compute the execution time of a workflow with specified to-be-check pointed tasks. Using this algorithm as a basis, we propose efficient heuristics for solving the scheduling problem. We evaluate these heuristics for representative workflow configurations.
  • Keywords
    "Program processors","Schedules","Checkpointing","Processor scheduling","Polynomials","Scheduling","Heuristic algorithms"
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2015.33
  • Filename
    7284346