Title :
Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data
Author :
Webb, Geoffrey I.
Author_Institution :
Fac. of Inf. Technol., Monash Univ., Melbourne, VIC, Australia
Abstract :
Discretization of streaming data has received surprisingly little attention. This might be because streaming data require incremental discretization with cut points that may vary over time and this is perceived as undesirable. We argue, to the contrary, that it can be desirable for a discretization to evolve in synchronization with an evolving data stream, even when the learner assumes that attribute values´ meanings remain invariant over time. We examine the issues associated with discretization in the context of distribution drift and develop computationally efficient incremental discretization algorithms. We show that discretization can reduce the error of a classical incremental learner and that allowing a discretization to drift in synchronization with distribution drift can further reduce error.
Keywords :
data handling; synchronisation; data streaming; distribution drift; incremental discretization; synchronization; Approximation algorithms; Context; Electricity; Histograms; Synchronization; Time-frequency analysis; Vectors;
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4303-6
DOI :
10.1109/ICDM.2014.123