Title :
Improvement in k-Means Clustering Algorithm Using Data Clustering
Author :
Rajeswari, K. ; Acharya, Omkar ; Sharma, Mayur ; Kopnar, Mahesh ; Karandikar, Kiran
Author_Institution :
Pimpri Chinchwad Coll. of Eng., Pune, India
Abstract :
The set of objects having same characteristics are organized in groups and clusters of these objects reformed known as Data Clustering. It is an unsupervised learning technique for classification of data. K-means algorithm is widely used and famous algorithm for analysis of clusters. In this algorithm, n number of data points are divided into k clusters based on some similarity measurement criterion. K-Means Algorithm has fast speed and thus is used commonly clustering algorithm. Vector quantization, cluster analysis, feature learning are some of the application of K-Means. However results generated using this algorithm are mainly dependant on choosing initial cluster centroids. The main short come of this algorithm is to provide appropriate number of clusters. Provision of number of clusters before applying the algorithm is highly impractical and requires deep knowledge of clustering field. In this project, we are going to propose an algorithm for improvement in the initializing the centroids for K-Means algorithm. We are going to work on numerical data sets along with the categorical datasets with the n dimensions. For similarity measurement we are going to consider the Manhattan distance,Dice distance and cosine distance. The result of this proposed algorithm will be compared with the original K-Means. Also the quality and complexity of the proposed algorithm will be checked with the existing algorithm.
Keywords :
data analysis; pattern clustering; unsupervised learning; vectors; cluster centroids; cosine distance; data clustering; data points; dice distance; feature learning; k-means clustering algorithm; manhattan distance; numerical data sets; similarity measurement criterion; unsupervised learning technique; vector quantization; Algorithm design and analysis; Clustering algorithms; Communities; Complexity theory; Computers; Electronic mail; Linear programming; Data Clustering; K-Means; centroid; unsupervised learning;
Conference_Titel :
Computing Communication Control and Automation (ICCUBEA), 2015 International Conference on
Conference_Location :
Pune
DOI :
10.1109/ICCUBEA.2015.205