DocumentCode
3186446
Title
Four level provenance support to achieve portable reproducibility of scientific workflows
Author
Banati, A. ; Kacsuk, P. ; Kozlovszky, M.
Author_Institution
Biotech Lab., Obuda Univ., Budapest, Hungary
fYear
2015
fDate
25-29 May 2015
Firstpage
241
Lastpage
244
Abstract
In the scientist´s community one of the most vital challenges is the issue of reproducibility of workflow execution. In order to reproduce the results of an experiment, on one hand provenance information must be collected and on the other hand the dependencies of the execution need to be eliminated. Concerning the workflow execution environment we have differentiated four levels of provenance: infrastructural, environmental, workflow and data provenance. During the re-execution at all levels the components can change and capturing the data of each levels targets different problems to solve. For example storing the environmental and infrastructural parameters enables the portability of workflows between the different parallel and distributed systems (grid, HPC, cloud). The describers of the workflow model enable tracking the different versions of the workflow and their impacts on the execution. Our goal is to capture the most optimal parameters in number and type as well and reconstruct the way of data production independently from the environment. In this paper we investigate the necessary and satisfactory parameters of workflow reproducibility and give a mathematical formula to determine the rate of reproducibility. These measurements allow the scientist to make a decision about the next steps toward the creation of reproducible workflows.
Keywords
cloud computing; electronic data interchange; grid computing; parallel processing; scientific information systems; HPC system; cloud system; component re-execution; data production; data provenance; distributed systems; environmental parameter; environmental provenance; execution dependency elimination; four-level provenance support; grid system; infrastructural parameter; infrastructural provenance; mathematical formula; optimal parameters; parallel systems; portable reproducibility; provenance information collection; reproducibility rate; scientific workflows; workflow execution reproducibility; workflow portability; workflow provenance; Best practices; Communities; Data models; Hardware; Mathematical model; Ports (Computers); Virtual machining;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015 38th International Convention on
Conference_Location
Opatija
Type
conf
DOI
10.1109/MIPRO.2015.7160272
Filename
7160272
Link To Document