DocumentCode
3079821
Title
An empirical comparison of repetitive undersampling techniques
Author
Van Hulse, Jason ; Khoshgoftaar, Taghi M. ; Napolitano, Amri
Author_Institution
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
fYear
2009
fDate
10-12 Aug. 2009
Firstpage
29
Lastpage
34
Abstract
A common problem for data mining and machine learning practitioners is class imbalance. When examples of one class greatly outnumber examples of the other class (es), traditional machine learning algorithms can perform poorly. Random undersampling is a technique that has shown great potential for alleviating the problem of class imbalance. However, undersampling leads to information loss which can hinder classification performance in some cases. To overcome this problem, repetitive undersampling techniques have been proposed. These techniques generate an ensemble of models, each trained on a different, undersampled subset of the training data. In doing so, less information is lost and classification performance is improved. In this study, we evaluate the performance of several repetitive undersampling techniques. To our knowledge, no study has so thoroughly compared repetitive undersampling techniques.
Keywords
data mining; learning (artificial intelligence); pattern classification; class imbalance; data mining; empirical comparison; hinder classification; machine learning; repetitive undersampling technique; Application software; Computer science; Data engineering; Data mining; Machine learning; Machine learning algorithms; Medical diagnosis; Performance loss; Sampling methods; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse & Integration, 2009. IRI '09. IEEE International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
978-1-4244-4114-3
Electronic_ISBN
978-1-4244-4116-7
Type
conf
DOI
10.1109/IRI.2009.5211614
Filename
5211614
Link To Document