Title :
Memory-constrained aggregate computation over data streams
Author :
Naidu, K.V.M. ; Rastogi, Rajeev ; Satkin, Scott ; Srinivasan, Anand
Author_Institution :
Yahoo! Labs. Bangalore, Bangalore, India
Abstract :
In this paper, we study the problem of efficiently computing multiple aggregation queries over a data stream. In order to share computation, prior proposals have suggested instantiating certain intermediate aggregates which are then used to generate the final answers for input queries. In this work, we make a number of important contributions aimed at improving the execution and generation of query plans containing intermediate aggregates. These include: (1) a different hashing model, which has low eviction rates, and also allows us to accurately estimate the number of evictions, (2) a comprehensive query execution cost model based on these estimates, (3) an efficient greedy heuristic for constructing good low-cost query plans, (4) provably near-optimal and optimal algorithms for allocating the available memory to aggregates in the query plan when the input data distribution is Zipf-like and Uniform, respectively, and (5) a detailed performance study with real-life IP flow data sets, which show that our multiple aggregates computation techniques consistently outperform the best-known approach.
Keywords :
data handling; query processing; storage management; IP flow data sets; Zipf-like; aggregation queries; data streams; greedy heuristic; hashing model; input data distribution; memory-constrained aggregate computation; query execution cost model; query plan execution; query plan generation; Aggregates; Analytical models; Computational modeling; Data models; IP networks; Memory management; Resource management;
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2011.5767860