DocumentCode :
3336532
Title :
Addressing Class Imbalance in Non-binary Classification Problems
Author :
Seliya, Naeem ; Xu, Zhiwei ; Khoshgoftaar, Taghi M.
Author_Institution :
Comput. & Inf. Sci., Univ. of Michigan - Dearborn, Dearborn, MI
Volume :
1
fYear :
2008
fDate :
3-5 Nov. 2008
Firstpage :
460
Lastpage :
466
Abstract :
The problem of class imbalance in machine learning is quite real and cumbersome when it comes to building a useful and practical classification model. We present a unique insight into addressing class imbalance for classification problems that involve three or more categories, i.e. non-binary. This study is different than related works in the literature because most works focus on addressing class imbalance only for binary classification problems, even if it means transforming a non-binary dataset into a binary classification problem. We propose an effective, yet simple approach to alleviating class imbalance issues when the classification problem involves more than two classes. The process, with four different methods, is based on applying random undersampling and random oversampling to different parts of the dataset for achieving better classification performance. The proposed data sampling methods are evaluated in the context of two real-world datasets obtained from the UCI Repository for Machine Learning Databases, and two commonly used classification algorithms: C4.5 and RIPPER. Our results demonstrate that the multi-group classification accuracy increases significantly in most cases after the proposed data sampling methods are applied. The positive outcome of this study motivates us to further our research on class imbalance and non-binary classification problems.
Keywords :
learning (artificial intelligence); pattern classification; random processes; sampling methods; class imbalance; data sampling methods; machine learning; nonbinary classification; random oversampling; random undersampling; Artificial intelligence; Classification algorithms; Databases; Information science; Machine learning; Machine learning algorithms; Predictive models; Sampling methods; Training data; USA Councils; Machine learning; artificial intelligence; class imbalance; data sampling; non-binary classifiers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2008. ICTAI '08. 20th IEEE International Conference on
Conference_Location :
Dayton, OH
ISSN :
1082-3409
Print_ISBN :
978-0-7695-3440-4
Type :
conf
DOI :
10.1109/ICTAI.2008.120
Filename :
4669724
Link To Document :
بازگشت