Title :
An infrastructure for scalable parallel multidimensional analysis
Author :
Goil, Sanjay ; Choudhary, Alok
Author_Institution :
Technol. Inst., Northwestern Univ., Evanston, IL, USA
Abstract :
Multidimensional analysis in online analytical processing (OLAP), and scientific and statistical databases (SSDB) use operations requiring summary information on multidimensional data sets. Most common are aggregate operations along one or more dimensions of numerical data values and/or on hierarchies defined on them. Simultaneous calculation of multidimensional aggregates are provided by the Data Cube operator. This is computed only partially if the number of dimensions is large. Queries may either be answered from a materialized cube or calculated on the fly. The multidimensionality of the underlying problem can be represented both in relational and multidimensional databases, the latter being a better fit when query performance is the criteria for judgement. Relational databases are scalable in size for OLAP and multidimensional analysis and efforts are on to make their performance acceptable. On the other hand multidimensional databases provide good performance for such queries, although they are not very scalable. We address scalability in multidimensional systems for analysis in SSDB and OLAP applications. We describe our system PARSIMONY-Parallel and Scalable Infrastructure for Multidimensional Online analytical processing. Sparsity of data sets is handled by using chunks to store data as a sparse set using a bit encoded sparse structure. Chunks provide a multidimensional index structure for efficient dimension oriented data accesses. Operations within and between chunks are a combination of relational and multidimensional operations depending on whether the chunk is sparse or dense. Performance results for high dimensional data sets on a distributed memory parallel machine (IBM SP-2) show good speedup and scalability
Keywords :
data mining; data structures; parallel databases; relational databases; scientific information systems; statistical databases; storage management; Data Cube operator; IBM SP-2; OLAP applications; PARSIMONY; Parallel and Scalable Infrastructure for Multidimensional Online analytical processing; SSDB; aggregate operations; bit encoded sparse structure; chunks; data set sparsity; dimension oriented data accesses; distributed memory parallel machine; materialized cube; multidimensional aggregates; multidimensional analysis; multidimensional data sets; multidimensional databases; multidimensional index structure; multidimensionality; numerical data values; online analytical processing; query performance; relational databases; scalability; scalable parallel multidimensional analysis; scientific and statistical databases; sparse set; summary information; Aggregates; Data engineering; Data structures; Databases; Electronic switching systems; Information analysis; Multidimensional systems; Performance analysis; Read only memory; Statistical analysis;
Conference_Titel :
Scientific and Statistical Database Management, 1999. Eleventh International Conference on
Conference_Location :
Cleveland, OH
Print_ISBN :
0-7695-0046-3
DOI :
10.1109/SSDM.1999.787625