Title :
An Increased Performance of Clustering High Dimensional Data Using Principal Component Analysis
Author :
Tajunisha, N. ; Saravanan, V.
Author_Institution :
Dept. of Comput. Sci., Sri Ramakrishna Coll. of Arts & Sci. for Women, Coimbatore, India
Abstract :
In many application domains such as information retrieval, computational biology, and image processing the data dimension is usually very high. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. The k-means clustering algorithm is used for many practical applications. But it is computationally expensive and the quality of the resulting clusters heavily depends on the selection of initial centroid and dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimensions of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. So it is required to reduce the dimensionality of the given dataset in order to improve the efficiency and accuracy. This paper proposed a new approach to improve the accuracy of the cluster results by using PCA to determine the initial centroid and also to reduce the dimension of the data.
Keywords :
data analysis; pattern clustering; principal component analysis; computational biology; curse of dimensionality; data dimension; high dimensional data; image processing; information retrieval; k-means clustering; principal component analysis; Accuracy; Algorithm design and analysis; Clustering algorithms; Iris; Machine learning algorithms; Partitioning algorithms; Principal component analysis; dimension reduction; k-means; principal component analysis;
Conference_Titel :
Integrated Intelligent Computing (ICIIC), 2010 First International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4244-7963-4
Electronic_ISBN :
978-0-7695-4152-5
DOI :
10.1109/ICIIC.2010.31