• DocumentCode
    1398012
  • Title

    Data Cube Materialization and Mining over MapReduce

  • Author

    Nandi, Arnab ; Yu, Cong ; Bohannon, Philip ; Ramakrishnan, Raghu

  • Author_Institution
    The Ohio State University, Columbus
  • Volume
    24
  • Issue
    10
  • fYear
    2012
  • Firstpage
    1747
  • Lastpage
    1759
  • Abstract
    Computing interesting measures for data cubes and subsequent mining of interesting cube groups over massive data sets are critical for many important analyses done in the real world. Previous studies have focused on algebraic measures such as SUM that are amenable to parallel computation and can easily benefit from the recent advancement of parallel computing infrastructure such as MapReduce. Dealing with holistic measures such as TOP-K, however, is nontrivial. In this paper, we detail real-world challenges in cube materialization and mining tasks on web-scale data sets. Specifically, we identify an important subset of holistic measures and introduce MR-Cube, a MapReduce-based framework for efficient cube computation and identification of interesting cube groups on holistic measures. We provide extensive experimental analyses over both real and synthetic data. We demonstrate that, unlike existing techniques which cannot scale to the 100 million tuple mark for our data sets, MR-Cube successfully and efficiently computes cubes with holistic measures over billion-tuple data sets.
  • Keywords
    Algorithm design and analysis; Data engineering; Data mining; Knowledge engineering; Data cube; MapReduce; cube materialization; cube mining; holistic measures.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.257
  • Filename
    6104048