DocumentCode
3324582
Title
Fixed-Precision Approximate Continuous Aggregate Queries in Peer-to-Peer Databases
Author
Banaei-Kashani, Famoush ; Shahabi, Cyrus
Author_Institution
Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA
fYear
2008
fDate
7-12 April 2008
Firstpage
1427
Lastpage
1429
Abstract
In this paper, we outline our efficient sample- based approach to answer fixed-precision approximate continuous aggregate queries in peer-to-peer databases. We describe our approach in the context of Digest, a two-tier system we have developed for correct and efficient query answering by sampling. With Digest, at the top tier we develop a query evaluation engine that uses the samples collected from the peer-to-peer database to continually estimate the running result of the approximate continuous aggregate query with guaranteed precision. For efficient query evaluation, we propose an extrapolation algorithm that predicts the evolution of the running result and adapts the frequency of the continual sampling occasions accordingly to avoid redundant samples. We also introduce a repeated sampling algorithm that draws on the correlation between the samples at successive sampling occasions and exploits linear regression to minimize the number of the samples derived at each occasion. At the bottom tier, we introduce a distributed sampling algorithm for random sampling (uniform and nonuniform) from peer- to-peer databases with arbitrary network topology and tuple distribution. Our sampling algorithm is based on the Metropolis Markov Chain Monte Carlo method that guarantees randomness of the sample with arbitrary small variation difference with the desired distribution, while it is comparable to optimal sampling in sampling cost/time. We evaluate the efficiency of Digest via simulation using real data.
Keywords
Markov processes; Monte Carlo methods; distributed algorithms; distributed databases; extrapolation; peer-to-peer computing; query processing; regression analysis; sampling methods; Digest two-tier system; Metropolis Markov chain Monte Carlo method; continual sampling occasions; distributed sampling algorithm; extrapolation algorithm; fixed-precision approximate continuous aggregate query answering; linear regression; network topology; peer-to-peer databases; query evaluation engine; random sampling; repeated sampling algorithm; successive sampling occasions; tuple distribution; Aggregates; Databases; Engines; Extrapolation; Frequency; Linear regression; Peer to peer computing; Prediction algorithms; Query processing; Sampling methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location
Cancun
Print_ISBN
978-1-4244-1836-7
Electronic_ISBN
978-1-4244-1837-4
Type
conf
DOI
10.1109/ICDE.2008.4497578
Filename
4497578
Link To Document