Title :
Clustering high-dimensional data via random sampling and consensus
Author :
Traganitis, Panagiotis A. ; Slavakis, Konstantinos ; Giannakis, Georgios B.
Author_Institution :
Dept. of ECE & Digital Technol. Center, Univ. of Minnesota, Minneapolis, MN, USA
Abstract :
In response to the urgent need for learning tools tuned to big data analytics, the present paper introduces a feature selection approach to efficient clustering of high-dimensional vectors. The resultant method leverages random sampling and consensus (RANSAC) arguments, originally developed for robust regression tasks in computer vision, to yield novel dimensionality reduction schemes. The advocated random sampling and consensus K-means (RSC-Kmeans) algorithm can operate in either batch or sequential modes, with the latter being able to afford lower computational footprint than the former. Extensive numerical tests on synthetic and real datasets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.
Keywords :
feature selection; pattern clustering; random processes; sampling methods; Big Data analytics; RANSAC arguments; RSC-Kmeans algorithm; batch modes; computational footprint; dimensionality reduction schemes; feature selection approach; high-dimensional data clustering; high-dimensional vector clustering; learning tools; numerical tests; random sampling-and-consensus K-means algorithm; real datasets; sequential modes; synthetic datasets; Accuracy; Big data; Clustering algorithms; Information processing; Pattern recognition; Robustness; Vectors; Clustering; K-means; feature selection; high-dimensional data; random sampling and consensus;
Conference_Titel :
Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on
Conference_Location :
Atlanta, GA
DOI :
10.1109/GlobalSIP.2014.7032128