مرکز منطقه ای اطلاع رساني علوم و فناوري - A Low-Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution

DocumentCode :

1093442

Title :

A Low-Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution

Author :

Wang, Peng ; Wang, Haixun ; Wu, Xiaochen ; Wang, Wei ; Shi, Baile

Author_Institution :

Fudan Univ., Shanghai

Volume :

Issue :

fYear :

2007

Firstpage :

1202

Lastpage :

1213

Abstract :

Many applications track streaming data for actionable alerts, which may include, for example, network intrusions, transaction frauds, bio-surveilence abnormalities, and so forth. Some stream classification models are built for this purpose. Due to concept drifts, maintaining a model´s up-to-dateness has become one of the most challenging tasks in mining data streams. State-of-the-art approaches, including both the incrementally updated classifiers and the ensemble classifiers, have proved that model update is a very costly process. In this paper, we show that reducing model granularity reduces the update cost, as models of fine granularity enable us to efficiently pinpoint local components in the model that are affected by the concept drift. It also enables us to derive new model components to reflect the current data distribution, thus avoiding expensive updates on a global scale. Furthermore, those actionable alerts being monitored are usually rare occurrences. The existing stream classifiers cannot handle this problem. We address this problem and show that the low-granularity classifier handles rare events on stream data with ease. Experiments on real and synthetic data show that our approach is able to maintain good prediction accuracy at a fraction of the model updating cost of state-of-the-art approaches.

Keywords :

data analysis; data mining; pattern classification; biased class distribution; concept drifts; data stream mining; low-granularity data streams classifier; Accuracy; Association rules; Costs; Data mining; Decision trees; Feedback; Monitoring; Predictive models; Training data; Ubiquitous computing; Classification; association rule; concept drift; data stream;

fLanguage :

English

Journal_Title :

Knowledge and Data Engineering, IEEE Transactions on

Publisher :

ieee

ISSN :

1041-4347

Type :

jour

DOI :

10.1109/TKDE.2007.1057

Filename :

4288140

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1093442