• DocumentCode
    779491
  • Title

    An efficient k-means clustering algorithm: analysis and implementation

  • Author

    Kanungo, Tapas ; Mount, David M. ; Netanyahu, Nathan S. ; Piatko, Christine D. ; Silverman, Ruth ; Wu, Angela Y.

  • Author_Institution
    Almaden Res. Center, San Jose, CA, USA
  • Volume
    24
  • Issue
    7
  • fYear
    2002
  • fDate
    7/1/2002 12:00:00 AM
  • Firstpage
    881
  • Lastpage
    892
  • Abstract
    In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd´s (1982) algorithm. We present a simple and efficient implementation of Lloyd´s k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm´s running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation
  • Keywords
    covariance matrices; filtering theory; pattern clustering; Lloyd algorithm; color quantization; data compression; data structure; data-sensitive analysis; filtering algorithm; image segmentation; k-means clustering algorithm; kd-tree; mean squared distance; Algorithm design and analysis; Clustering algorithms; Data analysis; Data compression; Data mining; Data structures; Filtering algorithms; Iterative algorithms; Machine learning algorithms; Pattern recognition;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2002.1017616
  • Filename
    1017616