DocumentCode
1798313
Title
A new distance metric for unsupervised learning of categorical data
Author
Hong Jia ; Yiu-ming Cheung
Author_Institution
Dept. of Comput. Sci., Hong Kong Baptist Univ., Hong Kong, China
fYear
2014
fDate
6-11 July 2014
Firstpage
1893
Lastpage
1899
Abstract
Distance metric is the basis of many learning algorithms and its effectiveness usually has significant influence on the learning results. Generally, measuring distance for numerical data is a tractable task, but for categorical data sets, it could be a nontrivial problem. This paper therefore presents a new distance metric for categorical data based on the characteristics of categorical values. Specifically, the distance between two values from one attribute measured by this metric is determined by both of the frequency probabilities of these two values and the values of other attributes which have high interdependency with the calculated one. Promising experimental results on different real data sets have shown the effectiveness of proposed distance metric.
Keywords
data analysis; unsupervised learning; categorical data; categorical values; distance metric; frequency probabilities; unsupervised learning; Frequency measurement; Hamming distance; Indexes; Joints; Probability; Redundancy;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4799-6627-1
Type
conf
DOI
10.1109/IJCNN.2014.6889890
Filename
6889890
Link To Document