SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data

Author

Kaur, Amardeep ; Datta, Amitava

Author_Institution

Sch. of Comput. Sci. & Software Eng., Univ. of Western Australia, Perth, WA, Australia

fYear

2014

fDate

14-14 Dec. 2014

Firstpage

621

Lastpage

628

Abstract

The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.

Keywords

pattern clustering; SUBSCALE; bottom-up search strategy; database scans; explicit generation; high dimensional data; implicit generation; k-dimensional data set; nontrivial subspace clusters; subspace clustering; Australia; Clustering algorithms; Conferences; Data mining; Educational institutions; Indexing; Data mining; High dimensional data; Subspace clustering;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining Workshop (ICDMW), 2014 IEEE International Conference on

Conference_Location

Shenzhen

Print_ISBN

978-1-4799-4275-6

Type

conf

DOI

10.1109/ICDMW.2014.100

Filename

7022654