DocumentCode :
3134247
Title :
MM-Cubing: computing Iceberg cubes by factorizing the lattice space
Author :
Shao, Zheng ; Han, Jiawei ; Xin, Dong
Author_Institution :
Illinois Univ. at Urbana-Champaign, Urbana, IL, USA
fYear :
2004
fDate :
21-23 June 2004
Firstpage :
213
Lastpage :
222
Abstract :
The data cube and iceberg cube computation problem has been studied by many researchers. There are three major approaches developed in this direction: (1) top-down computation, represented by MultiWay array aggregation (Zhao et. al., 1997) which utilizes shared computation and performs well on dense data sets; (2) bottom-up computation, represented by BUC (Beyer and Ramakrishnan, 1999), which takes advantage of Apriori Pruning and performs well on sparse data sets; and (3) integrated top-down and bottom-up computation, represented by Star-Cubing (Xin, et. al., 2003), which takes advantages of both and has high performance in most cases. However; the performance of Star-Cubing degrades in very sparse data sets due to the additional cost introduced by the tree structure. None of the three approaches achieves uniformly high performance on all kinds of data sets. In this paper; we present a new approach that compute Iceberg Cubes by factorizing the lattice space according to the frequency of values. This approach, different from all the previous dimension-based approaches where the importance of data distribution is not recognized, partitions the cube lattice into one dense subspace and several sparse subspaces. With this approach, a new method called MM-Cubing has been developed. MM-Cubing is highly adaptive to dense, sparse or skewed data sets. Our performance study shows that MM-Cubing is efficient and achieves high performance over all kinds of data distributions.
Keywords :
data mining; data warehouses; matrix decomposition; sparse matrices; tree data structures; Iceberg cube computation; MM-Cubing; MultiWay array aggregation; OLAP; Star-Cubing; apriori pruning; bottom-up computation; cube lattice; data distribution; data warehousing; lattice space factorization; sparse data sets; top-down computation; tree structure; Costs; Degradation; Frequency; High performance computing; Lattices; Regression analysis; Tree data structures; Warehousing; Wavelet analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
ISSN :
1099-3371
Print_ISBN :
0-7695-2146-0
Type :
conf
DOI :
10.1109/SSDM.2004.1311213
Filename :
1311213
Link To Document :
بازگشت