DocumentCode :
681319
Title :
A H-K clustering algorithm based on ensemble learning
Author :
Ying He ; Jian Wang ; Liang-xi Qin ; Lin Mei ; Yan-feng Shang ; Wen-fei Wang
Author_Institution :
Cyber Phys. Syst. R&D Center, Third Res. Inst. of Minist. of Public Security, Shanghai, China
fYear :
2013
fDate :
19-20 Aug. 2013
Firstpage :
300
Lastpage :
305
Abstract :
The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. However, it will lead to a dimensional disaster problem when apply to high dimensional dataset clustering due to its high computational complexity. Clustering ensemble exerts ensemble learning technique to get a better clustering result through learning merged data set of multiple clustering results. The objective of this paper is to improve the performance of traditional H-K clustering algorithm in high dimensional datasets. Using ensemble learning, a new clustering algorithm is proposed named EPCAHK (Ensemble Principle Component Analysis Hierarchical K-means Clustering algorithm). In the EPCAHK algorithm, the high dimensional dataset is mapped into a low dimensional space using PCA method firstly. Subsequently, the clustering results of the hierarchical stage for obtaining initial information (e.g., the cluster number or the initial clustering centers) are integrated by using the min-transitive closure method. Finally, the final clustering result is achieved by using K-means clustering algorithm based on the ensemble clustering results above. The experimental results indicate that comparing to the traditional H-K clustering algorithm, the EPCAHK obtains a better clustering result. The average accuracy of the clustering results can reach up to 90% or above, and the stability for the large high dimensional dataset is also improved.
Keywords :
computational complexity; learning (artificial intelligence); pattern clustering; principal component analysis; EPCAHK; H-K clustering algorithm; computational complexity; dimensional disaster problem; ensemble learning technique; ensemble principle component analysis hierarchical k-means clustering algorithm; high dimensional dataset clustering; low dimensional space; min-transitive closure method; Ensemble Learning; H-K; Large High Dimensional Dataset; Min-transitive Closure; PCA;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Smart and Sustainable City 2013 (ICSSC 2013), IET International Conference on
Conference_Location :
Shanghai
Electronic_ISBN :
978-1-84919-707-6
Type :
conf
DOI :
10.1049/cp.2013.1976
Filename :
6737840
Link To Document :
بازگشت