DocumentCode
249103
Title
A study on classifying imbalanced datasets
Author
Lakshmi, T. Jaya ; Prasad, C. Siva Rama
Author_Institution
Vasireddy Venkatadri Inst. of Technol., Guntur, India
fYear
2014
fDate
19-20 Aug. 2014
Firstpage
141
Lastpage
145
Abstract
Many problems in the real world are, in general modeled as binary classification problems and often one class samples outnumber other class samples. This imbalance causes the reduction in accuracy of prediction in minority class samples but give overall high accuracy. Ignoring misclassification rate of minority class causes severe problems in many cases such as fraudulent credit card transactions, medical diagnosis and e-mail foldering. Many classification algorithms existing in literature are designed for balanced datasets and these algorithms treat majority and minority class samples equal. In this study, the existing solutions for class imbalance problem and common evaluation techniques used for class imbalance are reviewed. The solutions were applied on three real world datasets. It is observed that a combination of SMOTE and Bagging with Random Forest produced the best overall accuracy of minority class.
Keywords
learning (artificial intelligence); pattern classification; SMOTE; bagging-with-random forest; binary classification problems; class imbalance problem; e-mail foldering; fraudulent credit card transactions; imbalanced dataset classification; majority class samples; medical diagnosis; minority class samples; real world datasets; Accuracy; Algorithm design and analysis; Bagging; Electronic mail; Fault diagnosis; Prediction algorithms; Radio frequency;
fLanguage
English
Publisher
ieee
Conference_Titel
Networks & Soft Computing (ICNSC), 2014 First International Conference on
Conference_Location
Guntur
Print_ISBN
978-1-4799-3485-0
Type
conf
DOI
10.1109/CNSC.2014.6906652
Filename
6906652
Link To Document