Author :
Acuña, Ruben ; Lacroix, Zoé ; Chomilier, Jacques
Author_Institution :
Arizona State Univ., Tempe, AZ, USA
Abstract :
Scientific discovery relies on an experimental framework that corroborates hypotheses with experiments that are complex reproducible processes generating and transforming large datasets. The methods, implicit in the process, capture the semantics of the data, thus they are responsible for the generation of scientific information and discovery of scientific knowledge. Scientific workflows provide the semantics needed to wrap scientific data from their capture, analysis, publication, and archival. By annotating data with the processes that produce them, the scientist no longer manages data but information and allows their meaningful interpretation and integration. Any change to a scientific workflow may impact significantly the quality of the data produced, their semantics, their future analysis, use, integration, and distribution, as well as the performance of the execution. Yet, scientific workflows are typically transformed over time, updated with new versions of the tools that compose them, extended to new functionality, and composed. In this paper we discuss the various impacts of workflow transformation and illustrate them with a case study on the Structural Prediction for pRotein fOlding UTility System (SPROUTS) Workflow.
Keywords :
biology computing; data mining; molecular biophysics; prediction theory; proteins; scientific information systems; SPROUTS; complex reproducible processes; data annotation; data semantics; datasets transformation; legacy biological workflows refurbishment; scientific data; scientific information generation; scientific knowledge discovery; scientific workflow; structural prediction for protein folding utility system workflow; workflow transformation; Bioinformatics; Databases; Decision support systems; Documentation; Proteins; Semantics; Terminology; Python; SPROUT; bioinformatics; database; legacy; refurbishing; restoration; server; workflow;