DocumentCode
78324
Title
Query Aware Determinization of Uncertain Objects
Author
Jie Xu ; Kalashnikov, Dmitri V. ; Mehrotra, Sanjay
Author_Institution
Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
Volume
27
Issue
1
fYear
2015
fDate
Jan. 2015
Firstpage
207
Lastpage
221
Abstract
This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks-triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a query-aware strategy and show its advantages over existing solutions through a comprehensive empirical evaluation over real and synthetic datasets.
Keywords
Internet; data analysis; query processing; Web applications; automated data analysis-enrichment techniques; legacy systems; probabilistic data deterministic representation; probabilistic data determinization; query aware determinization; selection queries; trigger; uncertain objects; Approximation algorithms; Data processing; Earthquakes; Measurement; Probabilistic logic; Speech; Speech recognition; Determinzation; branch and bound algorithm; data quality; determinization; query workload; uncertain data;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2013.170
Filename
6654145
Link To Document