Title :
Leveraging frequency and diversity based ensemble selection to consensus clustering
Author :
Banerjee, Adrish
Author_Institution :
Dept. of Math., Coll. of Eng. & Manage., Kolaghat, India
Abstract :
Consensus clustering, also called aggregation of clustering (or partitions) is a method that desires to improve the robustness and quality in clustering of a dataset by optimally reconciling the results of different clusterings of the same dataset generated in different ways. This paper proposes a novel way of arriving at a consensus clustering by an ensemble selection strategy. The method avoids considering the entire ensemble and judiciously select few clusterings in the ensemble without compromising on the quality of the consensus. It begins with sorting the ensemble by prioritizing clusterings based on diversity and frequency. It is observed that considering jointly the diversity and frequency helps in identifying few representative partitions that have high potentiality to form qualitatively better consensus than that of the entire ensemble. Finally a greedy strategy is used to select the clusterings in an iterative consensus generation technique that ensures the internal quality of clustering to be monotonically non-decreasing. Empirical results show that the consensus clustering obtained by the proposed algorithm gives better clustering accuracy for many datasets.
Keywords :
feature selection; iterative methods; pattern classification; pattern clustering; consensus clustering; dataset clustering; ensemble selection; iterative consensus generation technique; Data clustering; clustering ensemble; clustering quality; consensus clustering; ensemble selection;
Conference_Titel :
Contemporary Computing (IC3), 2014 Seventh International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-5172-7
DOI :
10.1109/IC3.2014.6897160