Title :
Distance learning for categorical attribute based on context information
Author :
Khorshidpour, Zeinab ; Hashemi, Sattar ; Hamzeh, Ali
Author_Institution :
Dept. Electron. & Comput. Eng., Shiraz Univ., Shiraz, Iran
Abstract :
In this paper, we propose a novel method to measure the dissimilarity of categorical data. Our approach is based on two steps, in the first step we select a relevant subset of the whole attributes set that we use as the context for a given attribute and in the second step computes dissimilarity between pair of values of the same attribute using the context defined in the previous step. Dissimilarity between two categorical values of an attribute compute as a combination of dissimilarities between the conditional probability distributions of context attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier.
Keywords :
data mining; distance learning; probability; ubiquitous computing; categorical attribute; categorical data dissimilarity measurement; conditional probability distributions; context attribute; context information; distance learning; nearest neighbor classifier; Categorical data; Distance function learning; Irrelevant feature; Nearest neighbor;
Conference_Titel :
Software Technology and Engineering (ICSTE), 2010 2nd International Conference on
Conference_Location :
San Juan, PR
Print_ISBN :
978-1-4244-8667-0
Electronic_ISBN :
978-1-4244-8666-3
DOI :
10.1109/ICSTE.2010.5608801