DocumentCode :
3079821
Title :
An empirical comparison of repetitive undersampling techniques
Author :
Van Hulse, Jason ; Khoshgoftaar, Taghi M. ; Napolitano, Amri
Author_Institution :
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2009
fDate :
10-12 Aug. 2009
Firstpage :
29
Lastpage :
34
Abstract :
A common problem for data mining and machine learning practitioners is class imbalance. When examples of one class greatly outnumber examples of the other class (es), traditional machine learning algorithms can perform poorly. Random undersampling is a technique that has shown great potential for alleviating the problem of class imbalance. However, undersampling leads to information loss which can hinder classification performance in some cases. To overcome this problem, repetitive undersampling techniques have been proposed. These techniques generate an ensemble of models, each trained on a different, undersampled subset of the training data. In doing so, less information is lost and classification performance is improved. In this study, we evaluate the performance of several repetitive undersampling techniques. To our knowledge, no study has so thoroughly compared repetitive undersampling techniques.
Keywords :
data mining; learning (artificial intelligence); pattern classification; class imbalance; data mining; empirical comparison; hinder classification; machine learning; repetitive undersampling technique; Application software; Computer science; Data engineering; Data mining; Machine learning; Machine learning algorithms; Medical diagnosis; Performance loss; Sampling methods; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse & Integration, 2009. IRI '09. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-4114-3
Electronic_ISBN :
978-1-4244-4116-7
Type :
conf
DOI :
10.1109/IRI.2009.5211614
Filename :
5211614
Link To Document :
بازگشت