• DocumentCode
    2485610
  • Title

    An Empirical Study of Learning from Imbalanced Data Using Random Forest

  • Author

    Khoshgoftaar, Taghi M. ; Golawala, Moiz ; Hulse, Jason Van

  • Author_Institution
    Florida Atlantic Univ., Boca Raton
  • Volume
    2
  • fYear
    2007
  • fDate
    29-31 Oct. 2007
  • Firstpage
    310
  • Lastpage
    317
  • Abstract
    This paper discusses a comprehensive suite of experiments that analyze the performance of the random forest (RF) learner implemented in Weka. RF is a relatively new learner, and to the best of our knowledge, only preliminary experimentation on the construction of random forest classifiers in the context of imbalanced data has been reported in previous work. Therefore, the contribution of this study is to provide an extensive empirical evaluation of RF learners built from imbalanced data. What should be the recommended default number of trees in the ensemble? What should the recommended value be for the number of attributes? How does the RF learner perform on imbalanced data when compared with other commonly-used learners? We address these and other related issues in this work.
  • Keywords
    learning (artificial intelligence); pattern classification; Weka; imbalanced data; learning; random forest classifiers; random forest learner; Analysis of variance; Artificial intelligence; Bagging; Classification tree analysis; Data mining; Decision trees; Machine learning; Noise robustness; Radio frequency; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
  • Conference_Location
    Patras
  • ISSN
    1082-3409
  • Print_ISBN
    978-0-7695-3015-4
  • Type

    conf

  • DOI
    10.1109/ICTAI.2007.46
  • Filename
    4410397