• DocumentCode
    243628
  • Title

    SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data

  • Author

    Kaur, Amardeep ; Datta, Amitava

  • Author_Institution
    Sch. of Comput. Sci. & Software Eng., Univ. of Western Australia, Perth, WA, Australia
  • fYear
    2014
  • fDate
    14-14 Dec. 2014
  • Firstpage
    621
  • Lastpage
    628
  • Abstract
    The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.
  • Keywords
    pattern clustering; SUBSCALE; bottom-up search strategy; database scans; explicit generation; high dimensional data; implicit generation; k-dimensional data set; nontrivial subspace clusters; subspace clustering; Australia; Clustering algorithms; Conferences; Data mining; Educational institutions; Indexing; Data mining; High dimensional data; Subspace clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-4799-4275-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2014.100
  • Filename
    7022654