• DocumentCode
    1504179
  • Title

    Robust Clustering Using Outlier-Sparsity Regularization

  • Author

    Forero, Pedro A. ; Kekatos, Vassilis ; Giannakis, Georgios B.

  • Author_Institution
    Electr. & Comput. Eng. Dept., Univ. of Minnesota, Minneapolis, MN, USA
  • Volume
    60
  • Issue
    8
  • fYear
    2012
  • Firstpage
    4163
  • Lastpage
    4177
  • Abstract
    Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data, which translates to sparsity in a judiciously chosen domain. Leveraging sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with respect to their non-robust counterparts. Kernelized versions of the robust clustering algorithms are also developed to efficiently handle high-dimensional data, identify nonlinearly separable clusters, or even cluster objects that are not represented by vectors. Numerical tests on both synthetic and real datasets validate the performance and applicability of the novel algorithms.
  • Keywords
    computational complexity; data handling; iterative methods; pattern clustering; probability; vectors; block coordinate descent approach; computational complexity; data clustering; hidden structure; high-dimensional data handling; iterative algorithm; kernelized version; nonlinearly separable cluster identification; outlier domain; outlier-aware robust K-means; outlier-sparsity regularization; probabilistic clustering approach; robust clustering; vector; Clustering algorithms; Covariance matrix; Optimization; Probabilistic logic; Robustness; Signal processing algorithms; Vectors; (Block) coordinate descent; K-means; clustering; expectation-maximization algorithm; group-Lasso; kernel methods; mixture models; robustness; sparsity;
  • fLanguage
    English
  • Journal_Title
    Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1053-587X
  • Type

    jour

  • DOI
    10.1109/TSP.2012.2196696
  • Filename
    6190766