• DocumentCode
    3559949
  • Title

    Exploratory Undersampling for Class-Imbalance Learning

  • Author

    Liu, Xu-Ying ; Wu, Jianxin ; Zhou, Zhi-Hua

  • Author_Institution
    Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing
  • Volume
    39
  • Issue
    2
  • fYear
    2009
  • fDate
    4/1/2009 12:00:00 AM
  • Firstpage
    539
  • Lastpage
    550
  • Abstract
    Undersampling is a popular method in dealing with class-imbalance problems, which uses only a subset of the majority class and thus is very efficient. The main deficiency is that many majority class examples are ignored. We propose two algorithms to overcome this deficiency. EasyEnsemble samples several subsets from the majority class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade trains the learners sequentially, where in each step, the majority class examples that are correctly classified by the current trained learners are removed from further consideration. Experimental results show that both methods have higher Area Under the ROC Curve, F-measure, and G-mean values than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of undersampling when the same number of weak classifiers is used, which is significantly faster than other methods.
  • Keywords
    data mining; learning (artificial intelligence); BalanceCascade; EasyEnsemble; F-measure; G-mean; class-imbalance learning; data mining; machine learning; Class-imbalance learning; data mining; ensemble learning; machine learning; undersampling;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • Conference_Location
    12/16/2008 12:00:00 AM
  • ISSN
    1083-4419
  • Type

    jour

  • DOI
    10.1109/TSMCB.2008.2007853
  • Filename
    4717268