Title of article :
Exploring the diversity in cluster ensemble generation: Random sampling and random projection
Author/Authors :
Yang، نويسنده , , Fan and Li، نويسنده , , Xuan and Li، نويسنده , , Qianmu and Li، نويسنده , , Tao، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Pages :
23
From page :
4844
To page :
4866
Abstract :
Cluster ensemble first generates a large library of different clustering solutions and then combines them into a more accurate consensus clustering. It is commonly accepted that for cluster ensemble to work well the member partitions should be different from each other, and meanwhile the quality of each partition should remain at an acceptable level. Many different strategies have been used to generate different base partitions for cluster ensemble. Similar to ensemble classification, many studies have been focusing on generating different partitions of the original dataset, i.e., clustering on different subsets (e.g., obtained using random sampling) or clustering in different feature spaces (e.g., obtained using random projection). However, little attention has been paid to the diversity and quality of the partitions generated using these two approaches. In this paper, we propose a novel cluster generation method based on random sampling, which uses the nearest neighbor method to fill the category information of the missing samples (abbreviated as RS-NN). We evaluate its performance in comparison with k-means ensemble, a typical random projection method (Random Feature Subset, abbreviated as FS), and another random sampling method (Random Sampling based on Nearest Centroid, abbreviated as RS-NC). Experimental results indicate that the FS method always generates more diverse partitions while RS-NC method generates high-quality partitions. Our proposed method, RS-NN, generates base partitions with a good balance between the quality and the diversity and achieves significant improvement over alternative methods. Furthermore, to introduce more diversity, we propose a dual random sampling method which combines RS-NN and FS methods. The proposed method can achieve higher diversity with good quality on most datasets.
Keywords :
Random projection , Ensemble generation , Random sampling , Ensemble clustering
Journal title :
Expert Systems with Applications
Serial Year :
2014
Journal title :
Expert Systems with Applications
Record number :
2354854
Link To Document :
بازگشت