• DocumentCode
    78324
  • Title

    Query Aware Determinization of Uncertain Objects

  • Author

    Jie Xu ; Kalashnikov, Dmitri V. ; Mehrotra, Sanjay

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
  • Volume
    27
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan. 2015
  • Firstpage
    207
  • Lastpage
    221
  • Abstract
    This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks-triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a query-aware strategy and show its advantages over existing solutions through a comprehensive empirical evaluation over real and synthetic datasets.
  • Keywords
    Internet; data analysis; query processing; Web applications; automated data analysis-enrichment techniques; legacy systems; probabilistic data deterministic representation; probabilistic data determinization; query aware determinization; selection queries; trigger; uncertain objects; Approximation algorithms; Data processing; Earthquakes; Measurement; Probabilistic logic; Speech; Speech recognition; Determinzation; branch and bound algorithm; data quality; determinization; query workload; uncertain data;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.170
  • Filename
    6654145