Title :
Regularized and sparse stochastic k-means for distributed large-scale clustering
Author :
Vilen Jumutc;Rocco Langone;Johan A. K. Suykens
Author_Institution :
KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
Abstract :
In this paper we present a novel clustering approach based on the stochastic learning paradigm and regularization with l1-norms. Our approach is an extension of the widely acknowledged K-Means algorithm. We introduce a simple regularized dual averaging scheme for learning prototype vectors (centroids) with l1-norms in a stochastic mode. In our approach we distribute the learning of individual prototype vectors for each cluster, and the re-assignment of cluster memberships is performed only for a fixed number of outer iterations. The latter approach is exactly the same as in original K-Means algorithm and aims at re-shuffling the pool of samples per cluster according to the learned centroids. We report an extended evaluation and comparison of our approach with respect to various clustering techniques like randomized K-Means and Proximal Plane Clustering. Our experimental studies indicate the usefulness of the proposed methods for obtaining better prototype vectors and corresponding cluster memberships while being able to perform feature selection by l1-norm minimization.
Keywords :
"Prototypes","Optimization","Silicon","Clustering algorithms","Big data","Stochastic processes","Partitioning algorithms"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7364050