Title :
Clustering for high dimensional data
Author :
Kumar Sharma, Varun ; Bala, Anju
Author_Institution :
Comput. Sci. & Eng. Dept., Thapar Univ., Patiala, India
Abstract :
Clustering is an exploratory data analysis technique, which categorizes the dataset into some groups. These groups are formed in a way so that items which have similar features live in same group and those have dissimilar features remain in other. There are many clustering algorithm available. Different kinds of algorithms are best used for different kinds of data. K-means is most used clustering analysis algorithm. It is an iterative approach of point assignment into k clusters. It gives best result and is easily implementable. The k-means algorithm has many issues with it. The main issue is its high time complexity. Several improvements have been suggested by research community. But when it is applied on high dimensional data, the complexity becomes infeasible. In this paper, an approach to reduce the computation of distance function has been proposed. It aims to define a cluster membership set for every cluster. The distance function is calculated only for the clusters which are contained in this set. With this membership set of cluster, the complexity of overall algorithm is reduced.
Keywords :
computational complexity; data analysis; pattern clustering; cluster membership set; clustering analysis algorithm; distance function; exploratory data analysis technique; high dimensional data clustering; k-means algorithm; point assignment; time complexity; Accuracy; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Complexity theory; Partitioning algorithms; Standards; Clustering; Data Mining; High Dimensional Data; Initial centroid; Partitioning Clustering Algorithm; k-means algorithm;
Conference_Titel :
Networks & Soft Computing (ICNSC), 2014 First International Conference on
Conference_Location :
Guntur
Print_ISBN :
978-1-4799-3485-0
DOI :
10.1109/CNSC.2014.6906700