• DocumentCode
    2548993
  • Title

    COCA: More Accurate Multidimensional Histograms out of More Accurate Correlations Detection

  • Author

    Wei Cao ; Xiongpai Qin ; Shan Wang

  • Author_Institution
    Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China, Beijing
  • fYear
    2008
  • fDate
    20-22 July 2008
  • Firstpage
    429
  • Lastpage
    434
  • Abstract
    Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, namely, entropy correlation coefficients, other than chi-square test to detect correlations among columns in large datasets. We introduce a novel yet simple kind of multi-dimensional synopses named COCA-Hist to cope with different correlations in databases. With the aid of the precise metric of entropy correlation coefficients, correlations of various degrees can be detected effectively; when correlation coefficients testify to mutual independence among columns, the AVI (attribute value independence) assumption can be adopted undoubtedly. COCA can also serve as a data-mining tool with superior qualities as CORDS does. We demonstrate the effectiveness and accuracy of our approach by several experiments.
  • Keywords
    data mining; relational databases; COCA; COCA-Hist; CORDS; attribute value independence; chi-square test; correlations detection; data-mining tool; entropy correlation coefficients; multi-dimensional synopses; multidimensional histograms; query execution plans; query optimizers; relational databases; Data engineering; Entropy; Feedback; Histograms; Information management; Knowledge engineering; Laboratories; Multidimensional systems; Sampling methods; Testing; COCA; correlation coefficients; correlations; multidimensional histograms; selectivity estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
  • Conference_Location
    Zhangjiajie Hunan
  • Print_ISBN
    978-0-7695-3185-4
  • Electronic_ISBN
    978-0-7695-3185-4
  • Type

    conf

  • DOI
    10.1109/WAIM.2008.21
  • Filename
    4597044