DocumentCode :
1205791
Title :
Communication and memory optimal parallel data cube construction
Author :
Jin, Ruoming ; Vaidyanathan, Karthikeyan ; Yang, Ge ; Agrawal, Gagan
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Volume :
16
Issue :
12
fYear :
2005
Firstpage :
1105
Lastpage :
1119
Abstract :
Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper addresses a number of algorithmic issues in parallel data cube construction. First, we present an aggregation tree for sequential (and parallel) data cube construction, which has minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We present a parallel algorithm based upon the aggregation tree. We analyze the interprocessor communication volume and construct a closed form expression for it. We prove that the same ordering of the dimensions in the aggregation tree minimizes both the computational and communication requirements. We also describe a method for partitioning the initial array and prove that it minimizes the communication volume. Finally, in the cases when memory may be a bottleneck, we describe how tiling can help scale sequential and parallel data cube construction. Experimental results from implementation of our algorithms on a cluster of workstations show the effectiveness of our algorithms and validate our theoretical results.
Keywords :
data mining; data warehouses; minimisation; parallel algorithms; tree data structures; aggregation tree; array partitioning; closed form expression; communication analysis; data warehouse; interprocessor communication; minimisation; optimal parallel data cube construction; parallel algorithm; parallel machines; sequential data cube construction; workstation clusters; Aggregates; Clustering algorithms; Concurrent computing; Data analysis; Data warehouses; Marketing and sales; Multidimensional systems; Parallel algorithms; Parallel machines; Performance analysis; Data warehouses; OLAP; communication analysis.; parallel algorithms;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2005.144
Filename :
1524948
Link To Document :
بازگشت