• DocumentCode
    1624568
  • Title

    A Study on Outlier distance and SSE with multidimensional datasets in K-means clustering

  • Author

    Rajee, A.M. ; Francis, F. Sagayaraj

  • Author_Institution
    Dept. of CSE, Pondicherry Eng. Coll., Puducherry, India
  • fYear
    2013
  • Firstpage
    33
  • Lastpage
    36
  • Abstract
    Clustering is a very well-known technique in data mining. One of the most widely used clustering techniques is the K-means algorithm. It is very popular because it is conceptually simple, computationally fast and memory efficient. In this paper, the role of noise points in limiting the efficacy of k-means algorithm was presented, by analyzing them within the purview of sum-of-squared error (SSE), which continues to remain the undisputedly popular validation method of K-means algorithm. Experimental studies were made with synthetic data sets of multiple dimensions and cluster sizes. Numerous noise points were barraged to the K clusters and the effect of noise distance on SSE was considered. On analyzing the results, we infer that the distance of noise to the cluster center influences SSE. This correlative study holds much significance, as the k-means algorithm assumes that the number of clusters in the database is perceived in anticipation. Apparently, this is not necessarily true in real-world applications. The study probes the pathognomonic role of noise points in the clustering outcome, which in the process will serve to provide with better results in real-world applications.
  • Keywords
    data mining; pattern clustering; statistical analysis; SSE; clustering outcome; data mining; k-means clustering technique; multidimensional dataset; noise distance; outlier distance; pathognomonic role; sum-of-squared error; Noise; Three-dimensional displays; K-means; SSE; data clustering; multidimensional data sets; outliers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing (ICoAC), 2013 Fifth International Conference on
  • Conference_Location
    Chennai
  • Print_ISBN
    978-1-4799-3447-8
  • Type

    conf

  • DOI
    10.1109/ICoAC.2013.6921923
  • Filename
    6921923