DocumentCode
1205791
Title
Communication and memory optimal parallel data cube construction
Author
Jin, Ruoming ; Vaidyanathan, Karthikeyan ; Yang, Ge ; Agrawal, Gagan
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Volume
16
Issue
12
fYear
2005
Firstpage
1105
Lastpage
1119
Abstract
Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper addresses a number of algorithmic issues in parallel data cube construction. First, we present an aggregation tree for sequential (and parallel) data cube construction, which has minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We present a parallel algorithm based upon the aggregation tree. We analyze the interprocessor communication volume and construct a closed form expression for it. We prove that the same ordering of the dimensions in the aggregation tree minimizes both the computational and communication requirements. We also describe a method for partitioning the initial array and prove that it minimizes the communication volume. Finally, in the cases when memory may be a bottleneck, we describe how tiling can help scale sequential and parallel data cube construction. Experimental results from implementation of our algorithms on a cluster of workstations show the effectiveness of our algorithms and validate our theoretical results.
Keywords
data mining; data warehouses; minimisation; parallel algorithms; tree data structures; aggregation tree; array partitioning; closed form expression; communication analysis; data warehouse; interprocessor communication; minimisation; optimal parallel data cube construction; parallel algorithm; parallel machines; sequential data cube construction; workstation clusters; Aggregates; Clustering algorithms; Concurrent computing; Data analysis; Data warehouses; Marketing and sales; Multidimensional systems; Parallel algorithms; Parallel machines; Performance analysis; Data warehouses; OLAP; communication analysis.; parallel algorithms;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2005.144
Filename
1524948
Link To Document