Title :
Weighted Naïve Bayes Classifier with Forgetting for Drifting Data Streams
Author :
Bartosz Krawczyk;Michal Wozniak
Author_Institution :
Dept. of Syst. &
Abstract :
Mining massive data streams in real-time is one of the contemporary challenges for machine learning systems. Such a domain encompass many of difficulties hidden beneath the term of Big Data. We deal with massive, incoming information that must be processed on-the-fly, with lowest possible response delay. We are forced to take into account time, memory and quality constraints. Our models must be able to quickly process large collection of data and swiftly adapt themselves to occurring changes (shifts and drifts) in data streams. In this paper, we propose a novel version of simple, yet effective Naïve Bayes classifier for mining streams. We add a weighting module, that automatically assigns an importance factor to each object extracted from the stream. The higher the weight, the bigger influence given object exerts on the classifier training procedure. We assume, that our model works in the non-stationary environment with the presence of concept drift phenomenon. To allow our classifier to quickly adapt its properties to evolving data, we imbue it with forgetting principle implemented as weight decay. With each passing iteration, the level of importance of previous objects is decreased until they are discarded from the data collection. We propose an efficient sigmoidal function for modeling the forgetting rate. Experimental analysis, carried out on a number of large data streams with concept drift prove that our weighted Naïve Bayes classifier displays highly satisfactory performance in comparison with state-of-the-art stream classifiers.
Keywords :
"Training","Adaptation models","Data mining","Memory management","Detectors","Data models","Probability"
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
DOI :
10.1109/SMC.2015.375