Title :
Improving Accuracy and Robustness of Self-Tuning Histograms by Subspace Clustering
Author :
Khachatryan, Andranik ; Muller, Emmanuel ; Stier, Christian ; Bohm, Klemens
Author_Institution :
Res. & Educ. Center, Armsoft LLC, Yerevan, Armenia
Abstract :
In large databases, the amount and the complexity of the data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems. So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms, which has not been questioned so far, is that they can learn the dataset from scratch, that is-starting with an empty bucket configuration. We show that this is not the case. Self-tuning methods are very sensitive to the initial configuration. Three major problems stem from this. Traditional self-tuning is unable to learn projections of multi-dimensional data, is sensitive to the order of queries, and reaches only local optima with high estimation errors. We show how to improve a self-tuning method significantly by starting with a carefully chosen initial configuration. We propose initialization by dense subspace clusters in projections of the data, which improves both accuracy and robustness of self-tuning. Our experiments on different datasets show that the error rate is typically halved compared to the uninitialized version.
Keywords :
database management systems; knowledge engineering; pattern clustering; query processing; data summarization techniques; database systems; large databases; model-free data summaries; query answering; query optimization; self-tuning histograms; self-tuning methods; subspace clustering; Accuracy; Correlation; Estimation error; Histograms; Merging; Query processing; Sensitivity; Adaptive histograms; Query Optimization; Query optimization; Selectivity Estimation; adaptive histograms; selectivity estimation; subspace clustering;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2015.2416725