Title :
An Entropy-Based Data Summarization Algorithm in Data Stream System
Author :
Lin, Ouyang ; Qing-ping, Guo
Author_Institution :
Sch. of Inf. Eng., Wuhan Univ. of Technol., Wuhan
Abstract :
Recently, there has been much interest in building stream processing applications. In these typical applications, also named data stream applications, data are usually unbounded, continuous, huge in amount, fast arriving, time various and bursting. In order to process the input data stream with real time constraints, overloaded data should be dropped. It is a key problem that how to drop the overloaded data. Through predicting the data which will stream into the system, data summarization algorithm can provide heuristic information to the data stream processing system to drop overloaded input data. In this paper, an entropy-based data summarization algorithm (EBDS) is presented. EBDS is designed to produce samples that are "close" to the whole data. By calculating the entropy of the data in the jumping window, it can get a high predictive accuracy. The experiments indicate that the entropy-based data summarization algorithm has a high predictive accuracy.
Keywords :
data handling; entropy; data stream system; entropy-based data summarization; input data stream; real time constraint; stream processing application; Accuracy; Aggregates; Conferences; Data processing; Degradation; Entropy; Frequency; Sampling methods; Time factors; Wavelet transforms;
Conference_Titel :
Computational Intelligence and Industrial Application, 2008. PACIIA '08. Pacific-Asia Workshop on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3490-9
DOI :
10.1109/PACIIA.2008.132