Title :
A comparative study on sampling techniques for handling class imbalance in streaming data
Author :
Nguyen, Hien M. ; Cooper, Eric W. ; Kamei, Kentaro
Author_Institution :
Grad. Sch. of Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
Abstract :
Sampling is the most popular approach for handling the class imbalance problem in training data. A number of studies have recently adapted sampling techniques for dynamic learning settings in which the training set is not fixed, but gradually grows over time. This paper presents an empirical study to compare over-sampling and under-sampling techniques in the context of data streaming. Experimental results show that under-sampling performs better than over-sampling at smaller training set sizes. All sampling techniques, however, are comparable when the training set becomes larger. This study also suggests that a multiple random under-sampling (MRUS) technique should be a good choice for applications with imbalanced and streaming data, because MRUS is the most effective while still keeping a high speed.
Keywords :
data handling; learning (artificial intelligence); MRUS technique; adapted sampling techniques; data streaming; dynamic learning; handling class imbalance; multiple random under sampling; sampling techniques; streaming data; training data; class imbalance; sampling; streaming data; training set size;
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on
Conference_Location :
Kobe
Print_ISBN :
978-1-4673-2742-8
DOI :
10.1109/SCIS-ISIS.2012.6505291