DocumentCode :
2350122
Title :
Hybrid sampling for imbalanced data
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason
Author_Institution :
Florida Atlantic University, Boca Raton, USA
fYear :
2008
fDate :
13-15 July 2008
Firstpage :
202
Lastpage :
207
Abstract :
Decision tree learning in the presence of imbalanced data is an issue of great practical importance, as such data is ubiquitous in a wide variety of application domains. We propose hybrid data sampling, which uses a combination of two sampling techniques such as random oversampling and random undersampling, to create a balanced dataset for use in the construction of decision tree classification models. The results demonstrate that our methodology is often able to improve the performance of a C4.5 decision tree learner in the context of imbalanced data.
Keywords :
Boosting; Classification algorithms; Classification tree analysis; Context modeling; Costs; Data mining; Decision trees; Machine learning; Sampling methods; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration, 2008. IRI 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV, USA
Print_ISBN :
978-1-4244-2659-1
Electronic_ISBN :
978-1-4244-2660-7
Type :
conf
DOI :
10.1109/IRI.2008.4583030
Filename :
4583030
Link To Document :
بازگشت