DocumentCode :
2573375
Title :
Weighted Rough Clustering on categorical data
Author :
Fu, Jian ; Yin, Jian
Author_Institution :
Sch. of Inf. Sci. & Technol., SUN YAT-SEN Univ., Guangzhou, China
fYear :
2011
fDate :
27-29 June 2011
Firstpage :
939
Lastpage :
944
Abstract :
Clustering is an unsupervised machine learning framework which is attracted much attention recently. Current clustering algorithms mainly focus on samples with real-value attributes, while there is little work on samples represented (partly) by categorical attributes. The difficulty of processing categorical attributes is that the similarity between such samples can´t be evaluated by Euclidean distance directly, as much real-value based methods do. We try to tackle this problem by adopting rough set theory. Rough similarity is used to define similarity between samples. Each attribute is assigned a weight to indicate its importance for clustering and an adaptive update process based on information gain is performed to find optimal solution of both weights and clusters. The benefit of the proposed method is: it can deal with categorical data naturally; it is not sensitive to input sequence of samples to be clustered; it optimizes both importance of attributes and number of clusters simultaneously. Experiments on UCI benchmark data set show the effectiveness with comparison to some previous famous methods.
Keywords :
pattern clustering; pattern matching; rough set theory; unsupervised learning; Euclidean distance; UCI benchmark data; categorical data; information gain; real value attribute; rough set theory; unsupervised machine learning; weighted rough data clustering; Accuracy; Algorithm design and analysis; Clustering algorithms; Euclidean distance; Rocks; Set theory; categorical data; clustering; rough set; rough similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Service System (CSSS), 2011 International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-9762-1
Type :
conf
DOI :
10.1109/CSSS.2011.5972099
Filename :
5972099
Link To Document :
بازگشت