• DocumentCode
    3248869
  • Title

    Adaptive dimension reduction for clustering high dimensional data

  • Author

    Ding, Chris ; He, Xiaofeng ; Zha, Hongyuan ; Simon, Horst D.

  • Author_Institution
    NERSC Div., California Univ., Berkeley, CA, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    147
  • Lastpage
    154
  • Abstract
    It is well-known that for high dimensional data clustering, standard algorithms such as EM and K-means are often trapped in a local minimum. Many initialization methods have been proposed to tackle this problem, with only limited success. In this paper we propose a new approach to resolve this problem by repeated dimension reductions such that K-means or EM are performed only in very low dimensions. Cluster membership is utilized as a bridge between the reduced dimensional subspace and the original space, providing flexibility and ease of implementation. Clustering analysis performed on highly overlapped Gaussians, DNA gene expression profiles and Internet newsgroups demonstrate the effectiveness of the proposed algorithm.
  • Keywords
    adaptive systems; data mining; pattern clustering; DNA gene expression profiles; EM algorithm; Internet newsgroups; K-means algorithm; adaptive dimension reduction; cluster membership; high dimensional data clustering; highly overlapped Gaussians; local minimum; reduced dimensional subspace; Algorithm design and analysis; Bridges; Clustering algorithms; Gaussian processes; Gene expression; Image analysis; Image processing; Information analysis; Performance analysis; Principal component analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1183897
  • Filename
    1183897