• DocumentCode
    249103
  • Title

    A study on classifying imbalanced datasets

  • Author

    Lakshmi, T. Jaya ; Prasad, C. Siva Rama

  • Author_Institution
    Vasireddy Venkatadri Inst. of Technol., Guntur, India
  • fYear
    2014
  • fDate
    19-20 Aug. 2014
  • Firstpage
    141
  • Lastpage
    145
  • Abstract
    Many problems in the real world are, in general modeled as binary classification problems and often one class samples outnumber other class samples. This imbalance causes the reduction in accuracy of prediction in minority class samples but give overall high accuracy. Ignoring misclassification rate of minority class causes severe problems in many cases such as fraudulent credit card transactions, medical diagnosis and e-mail foldering. Many classification algorithms existing in literature are designed for balanced datasets and these algorithms treat majority and minority class samples equal. In this study, the existing solutions for class imbalance problem and common evaluation techniques used for class imbalance are reviewed. The solutions were applied on three real world datasets. It is observed that a combination of SMOTE and Bagging with Random Forest produced the best overall accuracy of minority class.
  • Keywords
    learning (artificial intelligence); pattern classification; SMOTE; bagging-with-random forest; binary classification problems; class imbalance problem; e-mail foldering; fraudulent credit card transactions; imbalanced dataset classification; majority class samples; medical diagnosis; minority class samples; real world datasets; Accuracy; Algorithm design and analysis; Bagging; Electronic mail; Fault diagnosis; Prediction algorithms; Radio frequency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networks & Soft Computing (ICNSC), 2014 First International Conference on
  • Conference_Location
    Guntur
  • Print_ISBN
    978-1-4799-3485-0
  • Type

    conf

  • DOI
    10.1109/CNSC.2014.6906652
  • Filename
    6906652