DocumentCode
243628
Title
SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data
Author
Kaur, Amardeep ; Datta, Amitava
Author_Institution
Sch. of Comput. Sci. & Software Eng., Univ. of Western Australia, Perth, WA, Australia
fYear
2014
fDate
14-14 Dec. 2014
Firstpage
621
Lastpage
628
Abstract
The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.
Keywords
pattern clustering; SUBSCALE; bottom-up search strategy; database scans; explicit generation; high dimensional data; implicit generation; k-dimensional data set; nontrivial subspace clusters; subspace clustering; Australia; Clustering algorithms; Conferences; Data mining; Educational institutions; Indexing; Data mining; High dimensional data; Subspace clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location
Shenzhen
Print_ISBN
978-1-4799-4275-6
Type
conf
DOI
10.1109/ICDMW.2014.100
Filename
7022654
Link To Document