DocumentCode :
2450023
Title :
Online learning from imbalanced data streams
Author :
Nguyen, Hien M. ; Cooper, Eric W. ; Kamei, Katsuari
Author_Institution :
Grad. Sch. of Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
fYear :
2011
fDate :
14-16 Oct. 2011
Firstpage :
347
Lastpage :
352
Abstract :
Learning from imbalanced data has conventionally been conducted on stationary data sets. Recently, there have been several methods proposed for mining imbalanced data streams, in which training data is read in consecutive data chunks. Each data chunk is considered as a conventional imbalanced data set, making it easy to apply sampling methods to balance data chunks. However, one drawback of chunk-based learning methods is that the update of classification models is delayed until a full data chunk is received. Therefore, this paper proposes a new method for online learning from imbalanced data streams, which uses naive Bayes as the base learner. To deal with the problem of class imbalance, a new training instance from the minority class is always involved in learning, but one from the majority class is only used with a small probability. In effect, this method corresponds to an under-sampling technique on imbalanced data streams. We show the effectiveness of the proposed online learning method on ten UCI data sets of various domains. Problems in the performance of naive Bayes on imbalanced data sets are also discussed.
Keywords :
Bayes methods; data mining; learning (artificial intelligence); media streaming; sampling methods; UCI; data chunks; data mining; imbalanced data streams; naive Bayes method; online learning; sampling methods; stationary data sets; Accuracy; Data models; Equations; Learning systems; Mathematical model; Single photon emission computed tomography; Training; class imbalance; data streams; naive bayes; online learning; under-sampling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Soft Computing and Pattern Recognition (SoCPaR), 2011 International Conference of
Conference_Location :
Dalian
Print_ISBN :
978-1-4577-1195-4
Type :
conf
DOI :
10.1109/SoCPaR.2011.6089268
Filename :
6089268
Link To Document :
بازگشت