DocumentCode
1798135
Title
A hybrid coupled k-nearest neighbor algorithm on imbalance data
Author
Chunming Liu ; Longbing Cao ; Yu, Philip S.
Author_Institution
Adv. Analytics Inst., Univ. of Sydney Technol., Sydney, NSW, Australia
fYear
2014
fDate
6-11 July 2014
Firstpage
2011
Lastpage
2018
Abstract
The state-of-the-art classification algorithms rarely consider the relationship between the attributes in the data sets and assume the attributes are independently to each other (IID). However, in real-world data, these attributes are more or less interacted via explicit or implicit relationships. Although the classifiers for class-balanced data are relatively well developed, the classification of class-imbalanced data is not straightforward, especially for mixed type data which has both categorical and numerical features. Limited research has been conducted on the class-imbalanced data. Some algorithms mainly synthesize or remove instances to force the sizes of each class comparable, which may change the inherent data structure or introduces noise to the source data. While for the distance or similarity based algorithms, they ignored the relationship between features when computing the similarity. This paper proposes a hybrid coupled k-nearest neighbor classification algorithm (HC-kNN) for mixed type data, by doing discretization on numerical features to adapt the inter coupling similarity as we do on categorical features, then combing this coupled similarity to the original similarity or distance, to overcome the shortcoming of the previous algorithms. The experiment results demonstrate that our proposed algorithm can get a higher average performance than that of the relevant algorithms (e.g. the variants of kNN, Decision Tree, SMOTE and NaiveBayes).
Keywords
pattern classification; HC-kNN; IID; categorical features; class-balanced data; class-imbalanced data; classifier; data structure; distance based algorithm; explicit relationship; hybrid coupled k-nearest neighbor classification algorithm; imbalance data; implicit relationship; mixed type data; numerical features; similarity based algorithm; Algorithm design and analysis; Classification algorithms; Clouds; Couplings; Size measurement; Training; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4799-6627-1
Type
conf
DOI
10.1109/IJCNN.2014.6889798
Filename
6889798
Link To Document