DocumentCode :
3268415
Title :
Continuously maintaining quantile summaries of the most recent N elements over a data stream
Author :
Lin, Xuemin ; Lu, Hongjun ; Xu, Jian ; Yu, Jeffrey Xu
Author_Institution :
New South Wales Univ., Sydney, NSW, Australia
fYear :
2004
fDate :
30 March-2 April 2004
Firstpage :
362
Lastpage :
373
Abstract :
Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of εN. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(ε2N)/ε+1/ε2) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n ≤ N) can be answered with a guaranteed precision of εn. The worst case space requirement for this algorithm is only O(log2(εN)/ε2). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.
Keywords :
approximation theory; computational complexity; query processing; statistical analysis; Web log mining; access prediction; data stream; financial market; intrusion detection; network monitoring; quantile estimation error; quantile queries; quantile summary computing; stock price prediction; user click stream mining; Australia; Data mining; Estimation error; Histograms; Intrusion detection; Monitoring; Query processing; Statistics; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
ISSN :
1063-6382
Print_ISBN :
0-7695-2065-0
Type :
conf
DOI :
10.1109/ICDE.2004.1320011
Filename :
1320011
Link To Document :
بازگشت