Title :
Overcoming limitations of sampling for aggregation queries
Author :
Chaudhuri, Surajit ; Das, Gautam ; Datar, Mayur ; Motwani, Rajeev ; Narasayya, Vivek
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Abstract :
Studies the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To address this issue, we introduce a technique called outlier indexing. Uniform sampling is also ineffective for queries with low selectivity. We rely on weighted sampling based on workload information to overcome this shortcoming. We demonstrate that a combination of outlier indexing with weighted sampling can be used to answer aggregation queries with a significantly reduced approximation error compared to either uniform sampling or weighted sampling alone. We discuss the implementation of these techniques on Microsoft´s SQL Server and present experimental results that demonstrate the merits of our techniques
Keywords :
SQL; database indexing; file servers; query processing; relational databases; sampling methods; Microsoft SQL Server; aggregated attribute distribution; aggregation queries; approximate query answering; approximation error; outlier indexing; query selectivity; sampling limitations; skewed distribution; uniform sampling; weighted sampling; workload information; Aggregates; Computer science; Data mining; Query processing; Relational databases; Sampling methods;
Conference_Titel :
Data Engineering, 2001. Proceedings. 17th International Conference on
Conference_Location :
Heidelberg
Print_ISBN :
0-7695-1001-9
DOI :
10.1109/ICDE.2001.914867