DocumentCode :
3090110
Title :
Data clustering with mixed features by multi objective genetic algorithm
Author :
Dutta, D. ; Dutta, Pranab ; Sil, J.
Author_Institution :
Dept. of Comput. Sci. & Inf. Technol., Univ. of Burdwan, Golapbug, India
fYear :
2012
fDate :
4-7 Dec. 2012
Firstpage :
336
Lastpage :
341
Abstract :
In the paper, real coded multi objective genetic algorithm (MOGA) based K-clustering method has been studied where K represents the number of clusters known a priori. Proposed method has the capability to deal with continuous and categorical features (mixed features) of data set. Commonly means and modes of features represents clusters for continuous and categorical features respectively. For this reason, K-means and K-modes are most popular clustering algorithm for continuous and categorical features respectively. The searching power of Genetic Algorithm (GA) is exploited to search for suitable clusters and cluster centroids (means or modes) so that intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) are simultaneously optimized. It is achieved by measuring H and S using a special distance per feature metric, suitable for continuous and categorical features both. We have selected four benchmark data sets from UCI Machine Learning Repository containing continuous and categorical features both. Here, K-means and K-modes is hybridized with GA to combine global searching capabilities of GA with local searching capabilities of K-means and K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”.
Keywords :
data mining; genetic algorithms; pattern clustering; search problems; unsupervised learning; K-clustering method; K-means clustering algorithm; K-modes clustering algorithm; UCI machine learning repository; cluster centroids; context sensitivity; data clustering; data mining; data set categorical features; data set continuous features; global searching capabilities; high level knowledge extraction; intercluster distances; intracluster distance; pairwise crossover operator; real coded multiobjective genetic algorithm; substitution operator; unsupervised learning process; Biological cells; Clustering algorithms; Genetic algorithms; Indexes; Optimization; Sociology; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
Conference_Location :
Pune
Print_ISBN :
978-1-4673-5114-0
Type :
conf
DOI :
10.1109/HIS.2012.6421357
Filename :
6421357
Link To Document :
بازگشت