• DocumentCode
    8084
  • Title

    An Inference-Based Framework to Manage Data Provenance in Geoscience Applications

  • Author

    Huq, Mohammad Rezwanul ; Apers, Peter M. G. ; Wombacher, Andreas

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Twente, Enschede, Netherlands
  • Volume
    51
  • Issue
    11
  • fYear
    2013
  • fDate
    Nov. 2013
  • Firstpage
    5113
  • Lastpage
    5130
  • Abstract
    Data provenance allows scientists to validate their model as well as to investigate the origin of an unexpected value. Furthermore, it can be used as a replication recipe for output data products. However, capturing provenance requires enormous effort by scientists in terms of time and training. First, they need to design the workflow of the scientific model, i.e., workflow provenance, which requires both time and training. However, in practice, scientists may not document any workflow provenance before the model execution due to the lack of time and training. Second, they need to capture provenance while the model is running, i.e., fine-grained data provenance. Explicit documentation of fine-grained provenance is not feasible because of the massive storage consumption by provenance data in the applications, including those from the geoscience domain where data are continuously arriving and are processed. In this paper, we propose an inference-based framework, which provides both workflow and fine-grained data provenance at a minimal cost in terms of time, training, and disk consumption. Our proposed framework is applicable to any given scientific model, and is capable of handling different model dynamics, such as variation in the processing time as well as input data products arrival pattern. Our evaluation of the framework in a real use case with geospatial data shows that the proposed framework is relevant and suitable for scientists in geoscientific domain.
  • Keywords
    geographic information systems; geophysical techniques; geophysics computing; fine-grained data provenance; geoscience applications; inference-based framework; manage data provenance; massive storage consumption; output data products; scientific model workflow; Data models; Delays; Geospatial analysis; Irrigation; Mathematical model; Training; Data provenance; geoscience applications; hydrology; provenance graph; workflow;
  • fLanguage
    English
  • Journal_Title
    Geoscience and Remote Sensing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0196-2892
  • Type

    jour

  • DOI
    10.1109/TGRS.2013.2247769
  • Filename
    6494280