DocumentCode :
2167589
Title :
A GA-based clustering algorithm for large data sets with mixed and categorical values
Author :
Jie, LI ; Xinbo, Gao ; Li-cheng, Jiao
Author_Institution :
National Key Lab. of Radar Signal Process., Xidian Univ., Xi´´an, China
fYear :
2003
fDate :
27-30 Sept. 2003
Firstpage :
102
Lastpage :
107
Abstract :
In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numeric and categorical values.
Keywords :
data mining; genetic algorithms; pattern clustering; very large databases; GA-based clustering algorithm; categorical values; cluster analysis; cluster dispersion matrix; cost function; data mining; genetic algorithm; large data sets; mixed data set; mixed values; numeric data; Clustering algorithms; Cost function; Data analysis; Data mining; Genetic algorithms; Performance analysis; Prototypes; Radar signal processing; Signal analysis; Signal processing algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Multimedia Applications, 2003. ICCIMA 2003. Proceedings. Fifth International Conference on
Print_ISBN :
0-7695-1957-1
Type :
conf
DOI :
10.1109/ICCIMA.2003.1238108
Filename :
1238108
Link To Document :
بازگشت