DocumentCode :
3122935
Title :
Aggregate Query Answering under Uncertain Schema Mappings
Author :
Gal, Avigdor ; Martinez, Maria Vanina ; Simari, Gerardo I. ; Subrahmanian, V.S.
Author_Institution :
Technion - Israel Inst. of Technol., Haifa
fYear :
2009
fDate :
March 29 2009-April 2 2009
Firstpage :
940
Lastpage :
951
Abstract :
Recent interest in managing uncertainty in data integration has led to the introduction of probabilistic schema mappings and the use of probabilistic methods to answer queries across multiple databases using two semantics: by-table and by-tuple. In this paper, we develop three possible semantics for aggregate queries: the range, distribution, and expected value semantics, and show that these three semantics combine with the by-table and by-tuple semantics in six ways. We present algorithms to process COUNT, AVG, SUM, MIN, and MAX queries under all six semantics and develop results on the complexity of processing such queries under all six semantics. We show that computing COUNT is in PTIME for all six semantics and computing SUM is in PTIME for all but the by-tuple/distribution semantics. Finally, we show that AVG, MIN, and MAX are PTIME computable for all by-table semantics and for the by-tuple/range semantics.We developed a prototype implementation and experimented with both real-world traces and simulated data. We show that, as expected, naive processing of aggregates does not scale beyond small databases with a small number of mappings. The results also show that the polynomial time algorithms are scalable up to several million tuples as well as with a large number of mappings.
Keywords :
computational complexity; distributed databases; probability; query processing; statistical databases; aggregate query answering; by-table semantics; by-tuple semantics; computational complexity; data integration; distribution semantics; expected value semantics; multiple database; polynomial time algorithm; probabilistic schema mapping; query processing; range semantics; uncertain schema mapping; Aggregates; Conference management; Data engineering; Databases; Distributed computing; Educational institutions; Engineering management; Probability distribution; Technology management; Uncertainty; Aggregate Query Answering; Probabilistic Schema Mappings; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location :
Shanghai
ISSN :
1084-4627
Print_ISBN :
978-1-4244-3422-0
Electronic_ISBN :
1084-4627
Type :
conf
DOI :
10.1109/ICDE.2009.55
Filename :
4812467
Link To Document :
بازگشت