Title : 
Stability Yields a PTAS for k-Median and k-Means Clustering
         
        
            Author : 
Awasthi, Pranjal ; Blum, Avrim ; Sheffet, Or
         
        
            Author_Institution : 
Carnegie Mellon Univ., Pittsburgh, PA, USA
         
        
        
        
        
            Abstract : 
We consider fc-median clustering in finite metric spaces and fc-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the fc-means problem, Ostrovsky et al. show that if the optimal (k - 1)-means clustering of the input is more expensive than the optimal fc-means clustering by a factor of 1/∈2, then one can achieve a (1 + f(∈))-approximation to the fc-means optimal in time polynomial in n and k by using a variant of Lloyd\´s algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k - 1)-means optimal is more expensive than the fc-means optimal by a factor 1 + α for some constant α > 0, we can obtain a PTAS. In particular, under this assumption, for any ∈ > 0 we achieve a (1 + ∈)-approximation to the fc-means optimal in time polynomial in n and k, and exponential in 1/e and 1/α. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the fc-median problem in finite metrics under the analogous assumption as well. For fc-means, we in addition give a randomized algorithm with improved running time of no(1) (k log n)poly(1/∈,1/α) Our technique also obtains a PTAS under the assumption of Balcan et al. that all (1 + α) approximations are δ-close to a desired target clustering, in the case that all target clusters have size greater than δn and α > 0 is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for fc-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for fc-median we improve- - the "largeness" condition needed in to get exactly δ-close from O(δn) to δn. Our results are based on a new notion of clustering stability.
         
        
            Keywords : 
pattern clustering; stability; Euclidean spaces; Lloyd algorithm; PTAS; finite metric spaces; k-means clustering; k-median clustering; stability; Approximation algorithms; Approximation methods; Clustering algorithms; Extraterrestrial measurements; Optimized production technology; Polynomials;
         
        
        
        
            Conference_Titel : 
Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on
         
        
            Conference_Location : 
Las Vegas, NV
         
        
        
            Print_ISBN : 
978-1-4244-8525-3
         
        
        
            DOI : 
10.1109/FOCS.2010.36