• DocumentCode
    1205791
  • Title

    Communication and memory optimal parallel data cube construction

  • Author

    Jin, Ruoming ; Vaidyanathan, Karthikeyan ; Yang, Ge ; Agrawal, Gagan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • Volume
    16
  • Issue
    12
  • fYear
    2005
  • Firstpage
    1105
  • Lastpage
    1119
  • Abstract
    Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper addresses a number of algorithmic issues in parallel data cube construction. First, we present an aggregation tree for sequential (and parallel) data cube construction, which has minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We present a parallel algorithm based upon the aggregation tree. We analyze the interprocessor communication volume and construct a closed form expression for it. We prove that the same ordering of the dimensions in the aggregation tree minimizes both the computational and communication requirements. We also describe a method for partitioning the initial array and prove that it minimizes the communication volume. Finally, in the cases when memory may be a bottleneck, we describe how tiling can help scale sequential and parallel data cube construction. Experimental results from implementation of our algorithms on a cluster of workstations show the effectiveness of our algorithms and validate our theoretical results.
  • Keywords
    data mining; data warehouses; minimisation; parallel algorithms; tree data structures; aggregation tree; array partitioning; closed form expression; communication analysis; data warehouse; interprocessor communication; minimisation; optimal parallel data cube construction; parallel algorithm; parallel machines; sequential data cube construction; workstation clusters; Aggregates; Clustering algorithms; Concurrent computing; Data analysis; Data warehouses; Marketing and sales; Multidimensional systems; Parallel algorithms; Parallel machines; Performance analysis; Data warehouses; OLAP; communication analysis.; parallel algorithms;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2005.144
  • Filename
    1524948