Title :
Classification for concept-drifting data streams with limited amount of labeled data
Author :
Gong-De, Guo ; Nan, Li ; Li-Fei, Chen
Author_Institution :
School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China
Abstract :
Most existing concept-drifting data streams classification approaches assume that the true label of the instance in the data streams can be accessed right after it is classified and utilize it to detect concept drift as well as adjust the current model. It is impractical in real-world applications because manual labelling of data is both costly and time consuming. We apply a novel technique to overcome the problem mentioned above. The proposed method takes advantage of the model clusters generated by the fast KNNModel algorithm to classify the instances in the data streams. With the unlabeled testing instances, the arrival of a novel class and the drift in the underlying concept of a class are detected when the number of instances which are not covered by any model clusters increases rapidly at a certain significance level than that of before. The domain experts are asked to label a few instances to adjust the current model if and only if concept drift happens. Experimental results on both synthetic and real data streams show that compared with the traditional classification algorithms, our method acquires the comparable or better efficacy and efficiency using only a small amount of labelled data
Keywords :
KNNModel; concept drift; data steams;
Conference_Titel :
Automatic Control and Artificial Intelligence (ACAI 2012), International Conference on
Conference_Location :
Xiamen
Electronic_ISBN :
978-1-84919-537-9
DOI :
10.1049/cp.2012.1060