DocumentCode :
2886820
Title :
Learning in the Class Imbalance Problem When Costs are Unknown for Errors and Rejects
Author :
Xiaowan Zhang ; Bao-Gang Hu
Author_Institution :
NLPR/LIAMA, Inst. of Autom., Beijing, China
fYear :
2012
fDate :
10-10 Dec. 2012
Firstpage :
194
Lastpage :
201
Abstract :
In the context of the class imbalance problem, most existing approaches require the knowledge of costs for reaching the reasonable classification results. If the costs are unknown, some approaches can not work properly. Moreover, to our best knowledge, none of the cost-sensitive approaches is able to process the abstaining classifications when costs are unknown for errors and rejects. Therefore, the challenge above forms the motivation of this work. Based on information theory, we propose a novel cost-free learning approach which targets the maximization of normalized mutual information between the target outputs and the decision outputs of classifiers. Using the approach, we can deal with classifications with/without rejections when no cost terms are given. While the degree of class imbalance is changing, the proposed approach is able to balance the errors and rejects accordingly and automatically. Another advantage of the approach is its ability of deriving optimal reject thresholds for abstaining classification and the "equivalent" costs for binary probabilistic classification. Numerical investigation is made on several benchmark data sets and the classification results confirm the unique feature of the approach for overcoming the challenge.
Keywords :
information theory; learning (artificial intelligence); optimisation; pattern classification; probability; benchmark data sets; binary probabilistic classification; class imbalance problem; classifiers; cost-free learning approach; decision outputs; equivalent costs; errors; information theory; normalized mutual information maximization; optimal reject thresholds; rejects; target outputs; Accuracy; Equations; Error analysis; Machine learning; Mathematical model; Mutual information; Vectors; imbalanced learning; classification; reject option; cost-free learning; mutual information;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-5164-5
Type :
conf
DOI :
10.1109/ICDMW.2012.167
Filename :
6406441
Link To Document :
بازگشت