• DocumentCode
    2776604
  • Title

    Sampling + reweighting: Boosting the performance of AdaBoost on imbalanced datasets

  • Author

    Yuan, Bo ; Ma, Xiaoli

  • Author_Institution
    Intell. Comput. Lab., Tsinghua Univ., Shenzhen, China
  • fYear
    2012
  • fDate
    10-15 June 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Existing attempts to improve the performance of AdaBoost on imbalanced datasets have largely been focused on modifying its weight updating rule or incorporating sampling or cost sensitive learning techniques. In this paper, we propose to tackle the challenge from a novel perspective. Initially, the dataset is over-sampled and the standard AdaBoost is applied to create a series of base classifiers. Next, the weights of the classifiers are further retrained by Genetic Algorithms (GAs) or comparable optimization techniques where more targeted performance measures such as G-mean and F-measure can be directly used as the objective function. Consequently, unlike other indirect solutions, this sampling + reweighting strategy can purposefully tune AdaBoost towards a certain performance measure of interest with only moderate computational overhead. Experimental results on ten benchmark datasets show that this strategy can reliably boost the performance of AdaBoost and has consistent superiority over EasyEnsemble, which is a competent ensemble method for class imbalance learning.
  • Keywords
    genetic algorithms; learning (artificial intelligence); EasyEnsemble; F-measure; G-mean; GA; benchmark datasets; class imbalance learning; genetic algorithms; imbalanced datasets; optimization techniques; reweighting; sampling; sensitive learning techniques; standard AdaBoost; Accuracy; Cancer; Glass; Heart; Single photon emission computed tomography; Standards; Training; AdaBoost; Class Imbalance Learning; GAs; SMOTE;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2012 International Joint Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-1488-6
  • Electronic_ISBN
    2161-4393
  • Type

    conf

  • DOI
    10.1109/IJCNN.2012.6252738
  • Filename
    6252738