DocumentCode :
1798091
Title :
Coupled fuzzy k-nearest neighbors classification of imbalanced non-IID categorical data
Author :
Chunming Liu ; Longbing Cao ; Yu, Philip S.
Author_Institution :
Adv. Analytics Inst., Univ. of Sydney Technol., Sydney, NSW, Australia
fYear :
2014
fDate :
6-11 July 2014
Firstpage :
1122
Lastpage :
1129
Abstract :
Mining imbalanced data has recently received increasing attention due to its challenge and wide applications in the real world. Most of the existing work focuses on numerical data by manipulating the data structure which essentially changes the data characteristics or developing new distance or similarity measures which are designed for data with the so-called IID assumption, namely data is independent and identically distributed. This is not consistent with the real-life data and business needs, which request to fully respect the data structure and coupling relationships embedded in data objects, features and feature values. In this paper, we propose a novel coupled fuzzy similarity-based classification approach to cater for the difference between classes by a fuzzy membership and the couplings by coupled object similarity, and incorporate them into the most popular classifier: kNN to form a coupled fuzzy kNN (ie. CF-kNN). We test the approach on 14 categorical data sets compared to several kNN variants and classic classifiers including C4.5 and NaiveBayes. The experimental results show that CF-kNN outperforms the baselines, and those classifiers incorporated with the proposed coupled fuzzy similarity perform better than their original editions.
Keywords :
data mining; fuzzy set theory; pattern classification; CF-kNN; categorical data sets; classic classifier; coupled fuzzy k-nearest neighbor classification; coupled fuzzy kNN; coupled object similarity; coupling relationships; data characteristics; data objects; data structure manipulation; distance measures; feature values; fuzzy membership; fuzzy similarity-based classification approach; imbalanced data mining; imbalanced nonIID categorical data; numerical data; similarity measures; Algorithm design and analysis; Couplings; Data mining; Distributed databases; Equations; Feature extraction; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
Type :
conf
DOI :
10.1109/IJCNN.2014.6889773
Filename :
6889773
Link To Document :
بازگشت