DocumentCode
8084
Title
An Inference-Based Framework to Manage Data Provenance in Geoscience Applications
Author
Huq, Mohammad Rezwanul ; Apers, Peter M. G. ; Wombacher, Andreas
Author_Institution
Dept. of Comput. Sci., Univ. of Twente, Enschede, Netherlands
Volume
51
Issue
11
fYear
2013
fDate
Nov. 2013
Firstpage
5113
Lastpage
5130
Abstract
Data provenance allows scientists to validate their model as well as to investigate the origin of an unexpected value. Furthermore, it can be used as a replication recipe for output data products. However, capturing provenance requires enormous effort by scientists in terms of time and training. First, they need to design the workflow of the scientific model, i.e., workflow provenance, which requires both time and training. However, in practice, scientists may not document any workflow provenance before the model execution due to the lack of time and training. Second, they need to capture provenance while the model is running, i.e., fine-grained data provenance. Explicit documentation of fine-grained provenance is not feasible because of the massive storage consumption by provenance data in the applications, including those from the geoscience domain where data are continuously arriving and are processed. In this paper, we propose an inference-based framework, which provides both workflow and fine-grained data provenance at a minimal cost in terms of time, training, and disk consumption. Our proposed framework is applicable to any given scientific model, and is capable of handling different model dynamics, such as variation in the processing time as well as input data products arrival pattern. Our evaluation of the framework in a real use case with geospatial data shows that the proposed framework is relevant and suitable for scientists in geoscientific domain.
Keywords
geographic information systems; geophysical techniques; geophysics computing; fine-grained data provenance; geoscience applications; inference-based framework; manage data provenance; massive storage consumption; output data products; scientific model workflow; Data models; Delays; Geospatial analysis; Irrigation; Mathematical model; Training; Data provenance; geoscience applications; hydrology; provenance graph; workflow;
fLanguage
English
Journal_Title
Geoscience and Remote Sensing, IEEE Transactions on
Publisher
ieee
ISSN
0196-2892
Type
jour
DOI
10.1109/TGRS.2013.2247769
Filename
6494280
Link To Document