DocumentCode
2327175
Title
An efficient clustering algorithm for mixed type attributes in large dataset
Author
Yin, Jian ; Tan, Zhi-Fang ; Ren, Jiang-Tao ; Chen, Yi-Qun
Author_Institution
Dept. of Comput. Sci., Zhongshan Univ., Guangzhou, China
Volume
3
fYear
2005
fDate
18-21 Aug. 2005
Firstpage
1611
Abstract
Clustering is a widely used technique in data mining, at present there exists many clustering algorithms, but most existing clustering algorithms either are limited to handle the single attribute or can handle both data types but are not efficient when clustering large data sets. Few algorithms can do both well. In this article, we propose a clustering algorithm that can handle large datasets with mixed type of attributes. We first use CF*tree (just like CF-tree in BIRCH) to pre-cluster datasets. After that the dense regions are stored in leaf nodes, then we look every dense region as a single point and use the ameliorated k-prototype to cluster such dense regions. Experiment shows that this algorithm is very efficient in clustering large datasets with mixed type of attributes.
Keywords
data mining; pattern clustering; tree data structures; very large databases; CF*tree; CF-tree in BIRCH; clustering algorithm; data mining; k-prototype; large dataset; mixed type attributes; pre-cluster datasets; Clustering algorithms; Clustering methods; Computer science; Computer science education; Data mining; Databases; Design methodology; Machine learning; Partitioning algorithms; Statistics; CF*-tree; Clustering; Data Mining; k-prototype;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location
Guangzhou, China
Print_ISBN
0-7803-9091-1
Type
conf
DOI
10.1109/ICMLC.2005.1527202
Filename
1527202
Link To Document