Title :
Cost-sensitive learning methods for imbalanced data
Author :
Thai-Nghe, Nguyen ; Gantner, Zeno ; Schmidt-Thieme, Lars
Author_Institution :
Inf. Syst. & Machine Learning Lab., Univ. of Hildesheim, Hildesheim, Germany
Abstract :
Class imbalance is one of the challenging problems for machine learning algorithms. When learning from highly imbalanced data, most classifiers are overwhelmed by the majority class examples, so the false negative rate is always high. Although researchers have introduced many methods to deal with this problem, including resampling techniques and cost-sensitive learning (CSL), most of them focus on either of these techniques. This study presents two empirical methods that deal with class imbalance using both resampling and CSL. The first method combines and compares several sampling techniques with CSL using support vector machines (SVM). The second method proposes using CSL by optimizing the cost ratio (cost matrix) locally. Our experimental results on 18 imbalanced datasets from the UCI repository show that the first method can reduce the misclassification costs, and the second method can improve the classifier performance.
Keywords :
learning (artificial intelligence); pattern classification; support vector machines; user interfaces; CSL; UCI; cost sensitive learning methods; machine learning; pattern classification; support vector machines; Cancer; Kernel; Measurement; Nearest neighbor searches; Noise; Rain; Support vector machines;
Conference_Titel :
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6916-1
DOI :
10.1109/IJCNN.2010.5596486