DocumentCode :
2007001
Title :
A comparative study on sampling techniques for handling class imbalance in streaming data
Author :
Nguyen, Hien M. ; Cooper, Eric W. ; Kamei, Kentaro
Author_Institution :
Grad. Sch. of Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
fYear :
2012
fDate :
20-24 Nov. 2012
Firstpage :
1762
Lastpage :
1767
Abstract :
Sampling is the most popular approach for handling the class imbalance problem in training data. A number of studies have recently adapted sampling techniques for dynamic learning settings in which the training set is not fixed, but gradually grows over time. This paper presents an empirical study to compare over-sampling and under-sampling techniques in the context of data streaming. Experimental results show that under-sampling performs better than over-sampling at smaller training set sizes. All sampling techniques, however, are comparable when the training set becomes larger. This study also suggests that a multiple random under-sampling (MRUS) technique should be a good choice for applications with imbalanced and streaming data, because MRUS is the most effective while still keeping a high speed.
Keywords :
data handling; learning (artificial intelligence); MRUS technique; adapted sampling techniques; data streaming; dynamic learning; handling class imbalance; multiple random under sampling; sampling techniques; streaming data; training data; class imbalance; sampling; streaming data; training set size;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on
Conference_Location :
Kobe
Print_ISBN :
978-1-4673-2742-8
Type :
conf
DOI :
10.1109/SCIS-ISIS.2012.6505291
Filename :
6505291
Link To Document :
بازگشت