DocumentCode
1624568
Title
A Study on Outlier distance and SSE with multidimensional datasets in K-means clustering
Author
Rajee, A.M. ; Francis, F. Sagayaraj
Author_Institution
Dept. of CSE, Pondicherry Eng. Coll., Puducherry, India
fYear
2013
Firstpage
33
Lastpage
36
Abstract
Clustering is a very well-known technique in data mining. One of the most widely used clustering techniques is the K-means algorithm. It is very popular because it is conceptually simple, computationally fast and memory efficient. In this paper, the role of noise points in limiting the efficacy of k-means algorithm was presented, by analyzing them within the purview of sum-of-squared error (SSE), which continues to remain the undisputedly popular validation method of K-means algorithm. Experimental studies were made with synthetic data sets of multiple dimensions and cluster sizes. Numerous noise points were barraged to the K clusters and the effect of noise distance on SSE was considered. On analyzing the results, we infer that the distance of noise to the cluster center influences SSE. This correlative study holds much significance, as the k-means algorithm assumes that the number of clusters in the database is perceived in anticipation. Apparently, this is not necessarily true in real-world applications. The study probes the pathognomonic role of noise points in the clustering outcome, which in the process will serve to provide with better results in real-world applications.
Keywords
data mining; pattern clustering; statistical analysis; SSE; clustering outcome; data mining; k-means clustering technique; multidimensional dataset; noise distance; outlier distance; pathognomonic role; sum-of-squared error; Noise; Three-dimensional displays; K-means; SSE; data clustering; multidimensional data sets; outliers;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Computing (ICoAC), 2013 Fifth International Conference on
Conference_Location
Chennai
Print_ISBN
978-1-4799-3447-8
Type
conf
DOI
10.1109/ICoAC.2013.6921923
Filename
6921923
Link To Document