• DocumentCode
    681319
  • Title

    A H-K clustering algorithm based on ensemble learning

  • Author

    Ying He ; Jian Wang ; Liang-xi Qin ; Lin Mei ; Yan-feng Shang ; Wen-fei Wang

  • Author_Institution
    Cyber Phys. Syst. R&D Center, Third Res. Inst. of Minist. of Public Security, Shanghai, China
  • fYear
    2013
  • fDate
    19-20 Aug. 2013
  • Firstpage
    300
  • Lastpage
    305
  • Abstract
    The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. However, it will lead to a dimensional disaster problem when apply to high dimensional dataset clustering due to its high computational complexity. Clustering ensemble exerts ensemble learning technique to get a better clustering result through learning merged data set of multiple clustering results. The objective of this paper is to improve the performance of traditional H-K clustering algorithm in high dimensional datasets. Using ensemble learning, a new clustering algorithm is proposed named EPCAHK (Ensemble Principle Component Analysis Hierarchical K-means Clustering algorithm). In the EPCAHK algorithm, the high dimensional dataset is mapped into a low dimensional space using PCA method firstly. Subsequently, the clustering results of the hierarchical stage for obtaining initial information (e.g., the cluster number or the initial clustering centers) are integrated by using the min-transitive closure method. Finally, the final clustering result is achieved by using K-means clustering algorithm based on the ensemble clustering results above. The experimental results indicate that comparing to the traditional H-K clustering algorithm, the EPCAHK obtains a better clustering result. The average accuracy of the clustering results can reach up to 90% or above, and the stability for the large high dimensional dataset is also improved.
  • Keywords
    computational complexity; learning (artificial intelligence); pattern clustering; principal component analysis; EPCAHK; H-K clustering algorithm; computational complexity; dimensional disaster problem; ensemble learning technique; ensemble principle component analysis hierarchical k-means clustering algorithm; high dimensional dataset clustering; low dimensional space; min-transitive closure method; Ensemble Learning; H-K; Large High Dimensional Dataset; Min-transitive Closure; PCA;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Smart and Sustainable City 2013 (ICSSC 2013), IET International Conference on
  • Conference_Location
    Shanghai
  • Electronic_ISBN
    978-1-84919-707-6
  • Type

    conf

  • DOI
    10.1049/cp.2013.1976
  • Filename
    6737840