• DocumentCode
    2913503
  • Title

    The Effectiveness of Lloyd-Type Methods for the k-Means Problem

  • Author

    Ostrovsky, Rafail ; Rabani, Yuval ; Schulman, Leonard J. ; Swamy, Chaitanya

  • Author_Institution
    Dept. of Comput. Sci., California Univ., Los Angeles, CA
  • fYear
    2006
  • fDate
    Oct. 2006
  • Firstpage
    165
  • Lastpage
    176
  • Abstract
    We investigate variants of Lloyd´s heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd´s heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd´s heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd´s method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration
  • Keywords
    pattern clustering; probability; Lloyd-type iteration; Lloyd-type methods; clusterability criterion; high dimensional data clustering; k-means problem; near-optimal clustering solutions; probabilistic seeding process; Approximation algorithms; Clustering algorithms; Computer science; Cost function; Mathematics; Performance analysis; Polynomials; Sampling methods; Technological innovation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 2006. FOCS '06. 47th Annual IEEE Symposium on
  • Conference_Location
    Berkeley, CA
  • ISSN
    0272-5428
  • Print_ISBN
    0-7695-2720-5
  • Type

    conf

  • DOI
    10.1109/FOCS.2006.75
  • Filename
    4031353