• DocumentCode
    3234012
  • Title

    An improved K-Means clustering algorithm

  • Author

    Wang, Juntao ; Su, Xiaolong

  • Author_Institution
    Sch. of Comput. Sci. & Technol., China Univ. of Min. & Technol., Xuzhou, China
  • fYear
    2011
  • fDate
    27-29 May 2011
  • Firstpage
    44
  • Lastpage
    46
  • Abstract
    The K-Means clustering algorithm is proposed by Mac Queen in 1967 which is a partition-based cluster analysis method. It is used widely in cluster analysis for that the K-means algorithm has higher efficiency and scalability and converges fast when dealing with large data sets. However it also has many deficiencies: the number of clusters K needs to be initialized, the initial cluster centers are arbitrarily selected, and the algorithm is influenced by the noise points. In view of the shortcomings of the traditional K-Means clustering algorithm, this paper presents an improved K-means algorithm using noise data filter. The algorithm developed density-based detection methods based on characteristics of noise data where the discovery and processing steps of the noise data are added to the original algorithm. By preprocessing the data to exclude these noise data before clustering data set the cluster cohesion of the clustering results is improved significantly and the impact of noise data on K-means algorithm is decreased effectively and the clustering results are more accurate.
  • Keywords
    data mining; pattern clustering; statistical analysis; data mining; density-based detection methods; improved K-means clustering algorithm; noise data filter; partition-based cluster analysis method; Algorithm design and analysis; Filtering algorithms; Iris; Partitioning algorithms; Software; K-Means; cluster; outlier;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on
  • Conference_Location
    Xi´an
  • Print_ISBN
    978-1-61284-485-5
  • Type

    conf

  • DOI
    10.1109/ICCSN.2011.6014384
  • Filename
    6014384