مرکز منطقه ای اطلاع رساني علوم و فناوري - Clustering algorithm based on Condensed Set Dissimilarity for high dimensional sparse data of categorical attributes

DocumentCode :

3270179

Title :

Clustering algorithm based on Condensed Set Dissimilarity for high dimensional sparse data of categorical attributes

Author :

Wu, Sen ; Liu, Juanjuan ; Wei, Guiying

Author_Institution :

Sch. of Econ. & Manage., Univ. of Sci. & Technol. Beijing, Beijing, China

fYear :

2011

fDate :

18-20 Jan. 2011

Firstpage :

445

Lastpage :

448

Abstract :

Categorical data clustering is always challenging, especially when data is high dimensional and sparse. This paper proposes a new algorithm, named as CABOC, for clustering high dimensional sparse data with categorical attributes. Based on a new defined concept `Condensed Set Dissimilarity´, the algorithm computes the dissimilarity of all the objects with sparse categorical attributes in a set directly. Furthermore, the algorithm only records a Condensed Set Reduction vector of the set during the computation process, which is defined to simply and accurately represent the necessary information of all the objects with sparse categorical attributes in the set for the clustering. So the computational complexity of the algorithm is low. A numeric example for customer cluster analysis illustrates the effectiveness of the algorithm.

Keywords :

data handling; data mining; pattern clustering; set theory; CABOC; categorical data clustering algorithm; condensed set dissimilarity; condensed set reduction vector; customer cluster analysis; data mining; high dimensional sparse data; information representation; sparse categorical attributes; Algorithm design and analysis; Clustering algorithms; Memory; Condensed Set Dissimilarity; Condensed Set Reduction vector; categorical attributes; high dimensional sparse data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advanced Computer Control (ICACC), 2011 3rd International Conference on

Conference_Location :

Harbin

Print_ISBN :

978-1-4244-8809-4

Electronic_ISBN :

978-1-4244-8810-0

Type :

conf

DOI :

10.1109/ICACC.2011.6016450

Filename :

6016450

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3270179