Title :
Full and partial data cube computation and representation over commodity PCs
Author :
Moreira, Angelica Aparecida ; de Castro Lima, Joubert
Author_Institution :
Fed. Univ. of Ouro Preto (UFOP), Belo Horizonte, Brazil
Abstract :
The PnP (Pipe ´n Prune) approach is considered one of the most promising approaches for partial cube computation over distributed memory computer architectures, however it generates a huge amount of redundant data. In general, PnP does not consider data uniformity, named skew, when partitioning its workload and, thus, it imposes a maximum data redundancy even with uniform data. Due to this scenario, we implement P2CDM (acronym of Parallel Cube Computation with Distributed Memory) approach which has minimized communication and low data redundancy. Globally, at the entire cluster, P2CDM automatically generates data redundancy only for skewed values among all dimensions of a Data Warehouse. Locally, at each host, P2CDM provides cube cells pruning using MCG approach. The result is a distributed approach that computes massive full or partial data cubes over a cluster of commodity PCs. The experiments demonstrated that both approaches have similar speedup, but P2CDM approach is 20-25% faster and consumes 30-40% less memory at each host of the cluster, when compared to PnP approach.
Keywords :
data structures; data warehouses; MCG approach; P2CDM; PnP; commodity PC; cube cells pruning; data redundancy; data warehouse; distributed memory computer architecture; parallel cube computation; partial data cube computation; partial data cube representation; pipe ´n prune approach; Distributed databases; Memory management; Redundancy; Runtime; Silver;
Conference_Titel :
Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4673-2282-9
Electronic_ISBN :
978-1-4673-2283-6
DOI :
10.1109/IRI.2012.6303074