Title :
Consensus Clustering on big data
Author :
Hongfu Liu ; Gong Cheng ; Junjie Wu
Author_Institution :
Northeastern Univ., Boston, MA, USA
Abstract :
Big data clustering is a hot topic with the rising of user generated contents. Although a lot of clustering algorithms have been proposed and cloud computing resources are widely available, obtaining a good-quality partition with high efficiency is still up in the air. In this paper, we make full use of consensus clustering to handle Big Data clustering. Generally speaking, we use divide-and-conquer strategy to dissemble the whole Big Data into small subsets, then basic partitions are generated from small subsets and consensus clustering is followed to obtain the final result. For the consensus part, we apply K-means-based Consensus Clustering (KCC) to equivalently transfer the consensus clustering problem into a K-means-like optimization problem for high efficiency. Further, two-sided sampling is extended by random sampling on instances and features simultaneously. Extensive experiments on eight real-world data sets demonstrate the advantages of KCC over some widely used methods. More importantly, the ability to handle incomplete basic partitions and the natural suitability to distributed computing make KCC a promising candidate for Big Data clustering.
Keywords :
Big Data; learning (artificial intelligence); pattern clustering; random processes; sampling methods; Big Data clustering; Big Data dissembling; K-means-based consensus clustering; K-means-like optimization problem; KCC; data partitioning; data subsets; distributed computing; divide-and-conquer strategy; incomplete basic partition handling; random sampling; real-world data sets; two-sided sampling; user generated contents; Big data; Clustering algorithms; Convex functions; Linear programming; Optimization; Partitioning algorithms; Robustness;
Conference_Titel :
Service Systems and Service Management (ICSSSM), 2015 12th International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-4799-8327-8
DOI :
10.1109/ICSSSM.2015.7170344