• DocumentCode
    802500
  • Title

    Regression Cubes with Lossless Compression and Aggregation

  • Author

    Chen, Yixin ; Dong, Guozhu ; Han, Jiawei ; Pei, Jian ; Wah, Benjamin W. ; Wang, Jianyong

  • Author_Institution
    Dept. of Comput. Sci., Washington Univ., St. Louis, MO
  • Volume
    18
  • Issue
    12
  • fYear
    2006
  • Firstpage
    1585
  • Lastpage
    1599
  • Abstract
    As OLAP engines are widely used to support multidimensional data analysis, it is desirable to support in data cubes advanced statistical measures, such as regression and filtering, in addition to the traditional simple measures such as count and average. Such new measures allow users to model, smooth, and predict the trends and patterns of data. Existing algorithms for simple distributive and algebraic measures are inadequate for efficient computation of statistical measures in a multidimensional space. In this paper, we propose a fundamentally new class of measures, compressible measures, in order to support efficient computation of the statistical models. For compressible measures, we compress each cell into an auxiliary matrix with a size independent of the number of tuples. We can then compute the statistical measures for any data cell from the compressed data of the lower-level cells without accessing the raw data. Time- and space-efficient lossless aggregation formulae are derived for regression and filtering measures. Our analytical and experimental studies show that the resulting system, regression cube, substantially reduces the memory usage and the overall response time for statistical analysis of multidimensional data
  • Keywords
    data analysis; data compression; data mining; data warehouses; regression analysis; OLAP engines; data cubes; lossless aggregation; lossless compression; multidimensional data analysis; regression cubes; statistical analysis; Data analysis; Delay; Distributed computing; Engines; Extraterrestrial measurements; Filtering; Loss measurement; Multidimensional systems; Predictive models; Size measurement; Aggregation; OLAP.; compression; data cubes;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2006.196
  • Filename
    1717417