DocumentCode
3675977
Title
A Quantitative Study on the Re-executability of Publicly Shared Scientific Workflows
Author
Rudolf Mayer;Andreas Rauber
Author_Institution
SBA Res., Vienna, Austria
fYear
2015
Firstpage
312
Lastpage
321
Abstract
Workflows have become a popular means for implementing experiments in computational sciences. They are beneficial over other forms of implementation, as they require a formalisation of the experiment process, they provide a standard set of functions to be used, and provide an abstraction of the underlying system. Thus, they facilitate understandability and repeatability of experimental research. Also, additional meta data standards such as Research Objects, which allow to add more meta-data about the research process, shall enable better reproducibility of experiments. However, as several studies have shown, merely implementing an experiment as a workflow in a workflow engine is not sufficient to achieve these goals, as still a number of challenges and pitfalls prevail. In this paper, we want to quantify how many workflow executions are easy to repeat. To this end, we automatically obtain and analyse a set of almost 1,500 workflows available in the myExperiment platform, focusing on the ones authored in the Taverna workflow language. We provide statistics on the types of processing steps used, and investigate what vulnerabilities in regards to re-execution are faced. We then try to automatically execute the workflows. Form these results, we conclude which are the most common causes for failures, and analyse how these can be countered, with existing or yet to be developed approaches.
Keywords
"Program processors","Engines","Java","Web services","Ports (Computers)","Libraries"
Publisher
ieee
Conference_Titel
e-Science (e-Science), 2015 IEEE 11th International Conference on
Type
conf
DOI
10.1109/eScience.2015.58
Filename
7304314
Link To Document