• DocumentCode
    3336532
  • Title

    Addressing Class Imbalance in Non-binary Classification Problems

  • Author

    Seliya, Naeem ; Xu, Zhiwei ; Khoshgoftaar, Taghi M.

  • Author_Institution
    Comput. & Inf. Sci., Univ. of Michigan - Dearborn, Dearborn, MI
  • Volume
    1
  • fYear
    2008
  • fDate
    3-5 Nov. 2008
  • Firstpage
    460
  • Lastpage
    466
  • Abstract
    The problem of class imbalance in machine learning is quite real and cumbersome when it comes to building a useful and practical classification model. We present a unique insight into addressing class imbalance for classification problems that involve three or more categories, i.e. non-binary. This study is different than related works in the literature because most works focus on addressing class imbalance only for binary classification problems, even if it means transforming a non-binary dataset into a binary classification problem. We propose an effective, yet simple approach to alleviating class imbalance issues when the classification problem involves more than two classes. The process, with four different methods, is based on applying random undersampling and random oversampling to different parts of the dataset for achieving better classification performance. The proposed data sampling methods are evaluated in the context of two real-world datasets obtained from the UCI Repository for Machine Learning Databases, and two commonly used classification algorithms: C4.5 and RIPPER. Our results demonstrate that the multi-group classification accuracy increases significantly in most cases after the proposed data sampling methods are applied. The positive outcome of this study motivates us to further our research on class imbalance and non-binary classification problems.
  • Keywords
    learning (artificial intelligence); pattern classification; random processes; sampling methods; class imbalance; data sampling methods; machine learning; nonbinary classification; random oversampling; random undersampling; Artificial intelligence; Classification algorithms; Databases; Information science; Machine learning; Machine learning algorithms; Predictive models; Sampling methods; Training data; USA Councils; Machine learning; artificial intelligence; class imbalance; data sampling; non-binary classifiers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2008. ICTAI '08. 20th IEEE International Conference on
  • Conference_Location
    Dayton, OH
  • ISSN
    1082-3409
  • Print_ISBN
    978-0-7695-3440-4
  • Type

    conf

  • DOI
    10.1109/ICTAI.2008.120
  • Filename
    4669724