Title :
SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data
Author :
Kaur, Amardeep ; Datta, Amitava
Author_Institution :
Sch. of Comput. Sci. & Software Eng., Univ. of Western Australia, Perth, WA, Australia
Abstract :
The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.
Keywords :
pattern clustering; SUBSCALE; bottom-up search strategy; database scans; explicit generation; high dimensional data; implicit generation; k-dimensional data set; nontrivial subspace clusters; subspace clustering; Australia; Clustering algorithms; Conferences; Data mining; Educational institutions; Indexing; Data mining; High dimensional data; Subspace clustering;
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
DOI :
10.1109/ICDMW.2014.100