مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2350122

Title :

Hybrid sampling for imbalanced data

Author :

Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason

Author_Institution :

Florida Atlantic University, Boca Raton, USA

fYear :

2008

fDate :

13-15 July 2008

Firstpage :

202

Lastpage :

207

Abstract :

Decision tree learning in the presence of imbalanced data is an issue of great practical importance, as such data is ubiquitous in a wide variety of application domains. We propose hybrid data sampling, which uses a combination of two sampling techniques such as random oversampling and random undersampling, to create a balanced dataset for use in the construction of decision tree classification models. The results demonstrate that our methodology is often able to improve the performance of a C4.5 decision tree learner in the context of imbalanced data.

Keywords :

Boosting; Classification algorithms; Classification tree analysis; Context modeling; Costs; Data mining; Decision trees; Machine learning; Sampling methods; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Reuse and Integration, 2008. IRI 2008. IEEE International Conference on

Conference_Location :

Las Vegas, NV, USA

Print_ISBN :

978-1-4244-2659-1

Electronic_ISBN :

978-1-4244-2660-7

Type :

conf

DOI :

10.1109/IRI.2008.4583030

Filename :

4583030

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2350122