Title :
Hybrid sampling for imbalanced data
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason
Author_Institution :
Florida Atlantic University, Boca Raton, USA
Abstract :
Decision tree learning in the presence of imbalanced data is an issue of great practical importance, as such data is ubiquitous in a wide variety of application domains. We propose hybrid data sampling, which uses a combination of two sampling techniques such as random oversampling and random undersampling, to create a balanced dataset for use in the construction of decision tree classification models. The results demonstrate that our methodology is often able to improve the performance of a C4.5 decision tree learner in the context of imbalanced data.
Keywords :
Boosting; Classification algorithms; Classification tree analysis; Context modeling; Costs; Data mining; Decision trees; Machine learning; Sampling methods; Training data;
Conference_Titel :
Information Reuse and Integration, 2008. IRI 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV, USA
Print_ISBN :
978-1-4244-2659-1
Electronic_ISBN :
978-1-4244-2660-7
DOI :
10.1109/IRI.2008.4583030