Title :
Data clustering with mixed features by multi objective genetic algorithm
Author :
Dutta, D. ; Dutta, Pranab ; Sil, J.
Author_Institution :
Dept. of Comput. Sci. & Inf. Technol., Univ. of Burdwan, Golapbug, India
Abstract :
In the paper, real coded multi objective genetic algorithm (MOGA) based K-clustering method has been studied where K represents the number of clusters known a priori. Proposed method has the capability to deal with continuous and categorical features (mixed features) of data set. Commonly means and modes of features represents clusters for continuous and categorical features respectively. For this reason, K-means and K-modes are most popular clustering algorithm for continuous and categorical features respectively. The searching power of Genetic Algorithm (GA) is exploited to search for suitable clusters and cluster centroids (means or modes) so that intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) are simultaneously optimized. It is achieved by measuring H and S using a special distance per feature metric, suitable for continuous and categorical features both. We have selected four benchmark data sets from UCI Machine Learning Repository containing continuous and categorical features both. Here, K-means and K-modes is hybridized with GA to combine global searching capabilities of GA with local searching capabilities of K-means and K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”.
Keywords :
data mining; genetic algorithms; pattern clustering; search problems; unsupervised learning; K-clustering method; K-means clustering algorithm; K-modes clustering algorithm; UCI machine learning repository; cluster centroids; context sensitivity; data clustering; data mining; data set categorical features; data set continuous features; global searching capabilities; high level knowledge extraction; intercluster distances; intracluster distance; pairwise crossover operator; real coded multiobjective genetic algorithm; substitution operator; unsupervised learning process; Biological cells; Clustering algorithms; Genetic algorithms; Indexes; Optimization; Sociology; Statistics;
Conference_Titel :
Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
Conference_Location :
Pune
Print_ISBN :
978-1-4673-5114-0
DOI :
10.1109/HIS.2012.6421357