مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallel ROLAP data cube construction on shared-nothing multiprocessors

DocumentCode :

1661098

Title :

Parallel ROLAP data cube construction on shared-nothing multiprocessors

Author :

Chen, Ying ; Dehne, Frank ; Eavis, Todd ; Rau-Chaplin, Andrew

Author_Institution :

Dalhousie Univ., Halifax, NS, Canada

fYear :

2003

Abstract :

The pre-computation of data cubes is critical to improving the response time of on-line analytical processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. In order to meet the need for improved performance created by growing data sizes, parallel solutions for generating the data cube are becoming increasingly important. The paper presents a parallel method for generating data cubes on a shared-nothing multiprocessor. Since no (expensive) shared disk is required, our method can be used on low cost Beowulf style clusters consisting of standard PCs with local disks connected via a data switch. Our approach uses a ROLAP representation of the data cube where views are stored as relational tables. This allows for tight integration with current relational database technology. We have implemented our parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, local vs. global schedule trees, data skew, cardinality of dimensions, data dimensionality, and balance tradeoffs. For an input data set of 2000000 rows (72 Megabytes), our parallel data cube generation method achieves close to optimal speedup; generating a full data cube of ≈227 million rows (5.6 Gigabytes) on a 16 processors cluster in under 6 minutes. For an input data set of 10,000,000 rows (360 Megabytes), our parallel method, running on a 16 processor PC cluster, created a data cube consisting of ≈846 million rows (21.7 Gigabytes) in under 47 minutes.

Keywords :

data mining; data warehouses; multiprocessing systems; parallel programming; relational databases; workstation clusters; 21.7 GByte; 5.6 GByte; PC cluster; balance tradeoffs; data cube pre-computation; data dimensionality; data mining tasks; data skew; data switch; global schedule trees; input data set; large data warehouses; local disks; local schedule trees; low cost Beowulf style clusters; on-line analytical processing systems; parallel ROLAP data cube construction; parallel shared-nothing data cube generation method; parallel solutions; relational database technology; relational tables; shared-nothing multiprocessors; standard PCs; Acceleration; Costs; Data mining; Data warehouses; Delay; Instruments; Personal communication networks; Processor scheduling; Relational databases; Switches;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium, 2003. Proceedings. International

ISSN :

1530-2075

Print_ISBN :

0-7695-1926-1

Type :

conf

DOI :

10.1109/IPDPS.2003.1213169

Filename :

1213169

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1661098