Title :
A CSA-based clustering algorithm for large data sets with mixed numeric and categorical values
Author :
Jie, LI ; Xinbo, Gao ; Li-cheng, Jiao
Author_Institution :
Sch. of Electron. Eng., Xidian Univ., Xi´´an, China
Abstract :
In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The clonal selection algorithm (CSA) is used to optimize the new cost function. Experimental result illustrates that the CSA-based new clustering algorithm is feasible for the large data sets with mixed numeric and categorical values.
Keywords :
data mining; matrix algebra; optimisation; pattern clustering; statistical analysis; clonal selection algorithm; cluster analysis; cluster dispersion matrix; clustering algorithm; cost function; data mining; large data sets; mixed categorical value; mixed data set; mixed numerical value; optimisation; Algorithm design and analysis; Clustering algorithms; Cost function; Data analysis; Data engineering; Data mining; Databases; Partitioning algorithms; Performance analysis; Prototypes;
Conference_Titel :
Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on
Print_ISBN :
0-7803-8273-0
DOI :
10.1109/WCICA.2004.1342001