Title :
An ensemble classification approach for handling spatio-temporal drifts in partially labeled data streams
Author :
Sethi, Tegjyot Singh ; Kantardzic, Mehmed ; Arabmakki, Elaheh ; Hanquing Hu
Author_Institution :
Dept. of Comput. Eng. & Comput. Sci., Univ. of Louisville, Louisville, KY, USA
Abstract :
The classification of streaming data requires learning in an environment where the distribution of the incoming data might change continuously. Stream classification methodologies need to adapt to these changes under limitations of time and memory resources. As such, it is not possible to expect all the samples in the stream to be labeled, as labeling is often time consuming and expensive. In this paper a new ensemble classification approach is proposed, which can handle Spatio-Temporal drifts in streams even when the labeling is limited. The proposed methodology uses a grid density clustering approach to track drifts in the spatial configuration of the data, and maintains a set of classifier models local to each cluster, to track its evolution over time. Structured weighted aggregation of the models across all clusters is performed to produce an overall effective prediction on a new sample. Additionally, a uniform sampling approach amenable to the grid representation of the clusters is proposed, which selects samples to be labeled while preserving the grid density information of the stream. This provides for better selection of representative samples to be labeled, for improved drift detection and handling, while maintaining the labeling budget. Experimental comparison with state of the art drift handling systems shows that the proposed methodology is able to give a high classification performance, with a manageable ensemble size and with only 10% of the samples labeled.
Keywords :
learning (artificial intelligence); pattern classification; pattern clustering; sampling methods; classification performance; classifier models; ensemble classification approach; grid density clustering approach; grid density information preservation; labeling budget; partially labeled data stream; spatial configuration; spatio-temporal drift handling; stream classification methodology; streaming data classification; structured weighted aggregation; uniform sampling approach; Adaptation models; Clustering algorithms; Computational modeling; Data models; Labeling; Mathematical model; Predictive models;
Conference_Titel :
Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on
DOI :
10.1109/IRI.2014.7051961