DocumentCode :
2181790
Title :
Cleansing uncertain databases leveraging aggregate constraints
Author :
Chen, Haiquan ; Ku, Wei-Shinn ; Wang, Haixun
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Auburn Univ., Auburn, AL, USA
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
128
Lastpage :
135
Abstract :
Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.
Keywords :
computational complexity; evolutionary computation; simulated annealing; statistical databases; uncertainty handling; NP hard problem; aggregate constraints; evolutionary algorithm; nonlinear optimization problem; objective function; satisfaction degree functions; simulated annealing; uncertain databases cleansing; Aggregates; Application software; Asia; Computer science; Databases; Equations; Information retrieval; Remuneration; Software engineering; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-6522-4
Electronic_ISBN :
978-1-4244-6521-7
Type :
conf
DOI :
10.1109/ICDEW.2010.5452733
Filename :
5452733
Link To Document :
بازگشت