DocumentCode
659447
Title
4S: Scalable subspace search scheme overcoming traditional Apriori processing
Author
Hoang Vu Nguyen ; Muller, E. ; Bohm, K.
Author_Institution
Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
359
Lastpage
367
Abstract
In many real-world applications, data is collected in multi-dimensional spaces. However, not all dimensions are relevant for data analysis. Instead, interesting knowledge is hidden in correlated subsets of dimensions (i.e., subspaces of the original space). Detecting these correlated subspaces independent of the underlying mining task is an open research problem. It is challenging due to the exponential search space. Existing methods have tried to tackle this by utilizing Apriori search schemes. However, they show poor scalability and miss high quality subspaces. This paper features a scalable subspace search scheme (4S), which overcomes the efficiency problem by departing from the traditional levelwise search. We propose a new generalized notion of correlated subspaces which gives way to transforming the search space to a correlation graph of dimensions. Then we perform a direct mining of correlated subspaces in the graph. Finally, we merge subspaces based on the MDL principle and obtain high dimensional subspaces with minimal redundancy. We theoretically show that our search scheme is more general than existing search schemes and has a significantly lower runtime complexity. Our experiments reveal that 4S scales near-linearly with both database size and dimensionality, and produces higher quality subspaces than state-of-the-art methods.
Keywords
computational complexity; data analysis; data mining; graph theory; redundancy; search problems; MDL principle; apriori processing; apriori search schemes; correlation graph; data analysis; direct mining; exponential search space; high dimensional subspaces; minimal redundancy; mining task; multidimensional spaces; open research problem; runtime complexity; scalable subspace search scheme; Buildings; Correlation; Pairwise error probability; Redundancy; Runtime; Scalability; Space exploration;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691596
Filename
6691596
Link To Document