DocumentCode :
243628
Title :
SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data
Author :
Kaur, Amardeep ; Datta, Amitava
Author_Institution :
Sch. of Comput. Sci. & Software Eng., Univ. of Western Australia, Perth, WA, Australia
fYear :
2014
fDate :
14-14 Dec. 2014
Firstpage :
621
Lastpage :
628
Abstract :
The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.
Keywords :
pattern clustering; SUBSCALE; bottom-up search strategy; database scans; explicit generation; high dimensional data; implicit generation; k-dimensional data set; nontrivial subspace clusters; subspace clustering; Australia; Clustering algorithms; Conferences; Data mining; Educational institutions; Indexing; Data mining; High dimensional data; Subspace clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
Type :
conf
DOI :
10.1109/ICDMW.2014.100
Filename :
7022654
Link To Document :
بازگشت