• DocumentCode
    3079821
  • Title

    An empirical comparison of repetitive undersampling techniques

  • Author

    Van Hulse, Jason ; Khoshgoftaar, Taghi M. ; Napolitano, Amri

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2009
  • fDate
    10-12 Aug. 2009
  • Firstpage
    29
  • Lastpage
    34
  • Abstract
    A common problem for data mining and machine learning practitioners is class imbalance. When examples of one class greatly outnumber examples of the other class (es), traditional machine learning algorithms can perform poorly. Random undersampling is a technique that has shown great potential for alleviating the problem of class imbalance. However, undersampling leads to information loss which can hinder classification performance in some cases. To overcome this problem, repetitive undersampling techniques have been proposed. These techniques generate an ensemble of models, each trained on a different, undersampled subset of the training data. In doing so, less information is lost and classification performance is improved. In this study, we evaluate the performance of several repetitive undersampling techniques. To our knowledge, no study has so thoroughly compared repetitive undersampling techniques.
  • Keywords
    data mining; learning (artificial intelligence); pattern classification; class imbalance; data mining; empirical comparison; hinder classification; machine learning; repetitive undersampling technique; Application software; Computer science; Data engineering; Data mining; Machine learning; Machine learning algorithms; Medical diagnosis; Performance loss; Sampling methods; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse & Integration, 2009. IRI '09. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4244-4114-3
  • Electronic_ISBN
    978-1-4244-4116-7
  • Type

    conf

  • DOI
    10.1109/IRI.2009.5211614
  • Filename
    5211614