DocumentCode :
3268939
Title :
Range cube: efficient cube computation by exploiting data correlation
Author :
Feng, Ying ; Agrawal, Divyakant ; El Abbadi, Amr ; Metwally, Ahmed
Author_Institution :
Dept. of Comput. Sci., California Univ., Santa Barbara, CA, USA
fYear :
2004
fDate :
30 March-2 April 2004
Firstpage :
658
Lastpage :
669
Abstract :
Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. We introduce range cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.
Keywords :
computational complexity; data structures; data warehouses; H-cubing; aggregation value; attribute value; compressed cube; data correlation; data cube computation; data cube representation; data structure; disjoint range; drill-down semantics; preferred dimension order; range cube; range trie; real dataset; roll-up semantics; synthetic data; Cities and towns; Computational efficiency; Computer science; Data analysis; Data structures; Data warehouses; Multidimensional systems; Partitioning algorithms; Portable computers; Space technology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
ISSN :
1063-6382
Print_ISBN :
0-7695-2065-0
Type :
conf
DOI :
10.1109/ICDE.2004.1320035
Filename :
1320035
Link To Document :
بازگشت