• DocumentCode
    3186446
  • Title

    Four level provenance support to achieve portable reproducibility of scientific workflows

  • Author

    Banati, A. ; Kacsuk, P. ; Kozlovszky, M.

  • Author_Institution
    Biotech Lab., Obuda Univ., Budapest, Hungary
  • fYear
    2015
  • fDate
    25-29 May 2015
  • Firstpage
    241
  • Lastpage
    244
  • Abstract
    In the scientist´s community one of the most vital challenges is the issue of reproducibility of workflow execution. In order to reproduce the results of an experiment, on one hand provenance information must be collected and on the other hand the dependencies of the execution need to be eliminated. Concerning the workflow execution environment we have differentiated four levels of provenance: infrastructural, environmental, workflow and data provenance. During the re-execution at all levels the components can change and capturing the data of each levels targets different problems to solve. For example storing the environmental and infrastructural parameters enables the portability of workflows between the different parallel and distributed systems (grid, HPC, cloud). The describers of the workflow model enable tracking the different versions of the workflow and their impacts on the execution. Our goal is to capture the most optimal parameters in number and type as well and reconstruct the way of data production independently from the environment. In this paper we investigate the necessary and satisfactory parameters of workflow reproducibility and give a mathematical formula to determine the rate of reproducibility. These measurements allow the scientist to make a decision about the next steps toward the creation of reproducible workflows.
  • Keywords
    cloud computing; electronic data interchange; grid computing; parallel processing; scientific information systems; HPC system; cloud system; component re-execution; data production; data provenance; distributed systems; environmental parameter; environmental provenance; execution dependency elimination; four-level provenance support; grid system; infrastructural parameter; infrastructural provenance; mathematical formula; optimal parameters; parallel systems; portable reproducibility; provenance information collection; reproducibility rate; scientific workflows; workflow execution reproducibility; workflow portability; workflow provenance; Best practices; Communities; Data models; Hardware; Mathematical model; Ports (Computers); Virtual machining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015 38th International Convention on
  • Conference_Location
    Opatija
  • Type

    conf

  • DOI
    10.1109/MIPRO.2015.7160272
  • Filename
    7160272