Title :
COCA: More Accurate Multidimensional Histograms out of More Accurate Correlations Detection
Author :
Wei Cao ; Xiongpai Qin ; Shan Wang
Author_Institution :
Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China, Beijing
Abstract :
Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, namely, entropy correlation coefficients, other than chi-square test to detect correlations among columns in large datasets. We introduce a novel yet simple kind of multi-dimensional synopses named COCA-Hist to cope with different correlations in databases. With the aid of the precise metric of entropy correlation coefficients, correlations of various degrees can be detected effectively; when correlation coefficients testify to mutual independence among columns, the AVI (attribute value independence) assumption can be adopted undoubtedly. COCA can also serve as a data-mining tool with superior qualities as CORDS does. We demonstrate the effectiveness and accuracy of our approach by several experiments.
Keywords :
data mining; relational databases; COCA; COCA-Hist; CORDS; attribute value independence; chi-square test; correlations detection; data-mining tool; entropy correlation coefficients; multi-dimensional synopses; multidimensional histograms; query execution plans; query optimizers; relational databases; Data engineering; Entropy; Feedback; Histograms; Information management; Knowledge engineering; Laboratories; Multidimensional systems; Sampling methods; Testing; COCA; correlation coefficients; correlations; multidimensional histograms; selectivity estimation;
Conference_Titel :
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location :
Zhangjiajie Hunan
Print_ISBN :
978-0-7695-3185-4
Electronic_ISBN :
978-0-7695-3185-4
DOI :
10.1109/WAIM.2008.21