• DocumentCode
    625656
  • Title

    The Bounded Data Reuse Problem in Scientific Workflows

  • Author

    Zohrevandi, Mohsen ; Bazzi, Rida A.

  • Author_Institution
    Sch. of Comput., Inf., & Decision Syst. Eng., Arizona State Univ., Tempe, AZ, USA
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    1051
  • Lastpage
    1062
  • Abstract
    Large datasets and time-consuming processes have become the norm in scientific computing applications. The exploration phase in the development of scientific workflows involves trial-and-error with workflow components, which can take a lot of time given the time-consuming nature of the workflow tasks. These facts suggest the possibility of reducing the development time by reusing intermediate data whenever possible. However the storage space is always limited. This introduces a problem: which intermediate datasets from one workflow should be kept to be reused in another workflow, with a limited amount of storage. For the general class of series parallel graphs, we model this problem using a non-linear integer programming formulation and show that it is NP-Hard. We provide a branch and bound optimal algorithm as well as efficient heuristics. We conducted experiments over a large set of randomly-generated workflows as well as a smaller set of synthetic workflows which are based on real-world workflows used by scientists in different disciplines. Our experiments show that the best solution produced by the heuristics only differs from the optimal value by less than 1% on average.
  • Keywords
    data handling; graph theory; integer programming; nonlinear programming; parallel programming; NP-hard problem; bounded data reuse problem; large datasets; nonlinear integer programming; scientific computing; scientific workflows; series parallel graphs; time-consuming process; trial-and-error; Computational modeling; Data models; Educational institutions; Heuristic algorithms; Linear programming; Merging; Smoothing methods; Data Reuse; Intermediate Data; Scientific Workflows; Series-Parallel;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
  • Conference_Location
    Boston, MA
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-6066-1
  • Type

    conf

  • DOI
    10.1109/IPDPS.2013.71
  • Filename
    6569884