Title :
SAE: Social Adaptive Ensemble classifier for data streams
Author :
Murilo Gomes, Heitor ; Enembreck, Fabricio
Author_Institution :
Programa de Pos-Grad. em Inf. (PPGIA), Pontificia Univ. Catolica do Parana (PUCPR), Curitiba, Brazil
Abstract :
This work encompasses the development of a new ensemble classifier that uses a Social Network abstraction for Data Stream Classification, namely the Social Adaptive Ensemble (SAE). In the context of data stream classification, concept drift is considered one of the most difficult and important issues to be addressed. Ensemble classifiers can be successfully applied to data streams as long as the ensemble efficiently adapts itself in the occurrence of a concept drift. SAE algorithm inherits strategies from other ensemble methods, such as Online Bagging [4] and DWM [2], and merge these with the notion of connectivity between similar classifiers w.r.t. their individual predictions. The relational data obtained through measuring similarities between classifiers is used to arrange ensemble members in a social network structure that allows us to identify subgroups (subnetworks) of highly similar classifiers. Being able to identify similar classifiers allows us to implement a combination strategy that first combines predictions within similar classifiers and later combines these into the final prediction. Moreover, this combination strategy assigns more weight to recently added classifiers predictions during concept drifts, since these are dissimilar to all other existing classifiers. The similarity between classifiers is also used to identify and remove redundant classifiers. This effectively saves systems resources and sometimes improves accuracy. We present empirical experiments with synthetic data streams containing abrupt, gradual and no drift showing that SAE is a valid option for stream classification, especially when data stream characteristics (e.g. presence of abrupt drifts) are previously unknown and system resources, such as CPU time and memory space, are a concern.
Keywords :
data analysis; learning (artificial intelligence); pattern classification; CPU time; DWM algorithm; SAE algorithm; concept drift; data stream characteristics; data stream classification; dynamic weighted majority; memory space; online bagging; relational data; social adaptive ensemble classifier; social network abstraction; social network structure; Accuracy; Bagging; Classification algorithms; Generators; Prediction algorithms; Training; Vegetation; concept drift; data stream classification; data stream mining; ensemble classifier;
Conference_Titel :
Computational Intelligence and Data Mining (CIDM), 2013 IEEE Symposium on
Conference_Location :
Singapore
DOI :
10.1109/CIDM.2013.6597237