DocumentCode
2358596
Title
Query-Aware Sampling for Data Streams
Author
Johnson, Theodore ; Muthukrishnan, S. ; Shkapenyuk, Vladislav ; Spatscheck, Oliver
Author_Institution
A&T Labs-Res., Murray Hill
fYear
2007
fDate
17-20 April 2007
Firstpage
664
Lastpage
673
Abstract
Data stream management systems are useful when large volumes of data need to be processed in real time. Examples include monitoring network traffic, monitoring financial transactions, and analyzing large scale scientific data feeds. These applications have varying data rates and often show bursts of high activity that overload the system, often during the most critical instants (e.g., network attacks, financial spikes) for analysis. Therefore, load shedding is necessary to preserve the stability of the system, gracefully degrade its performance and extract answers. Existing methods for load shedding in a general purpose data stream query system use random sampling of tuples, essentially independent of the query. While this technique is acceptable for some queries, the results may be meaningless or even incorrect for other queries, lit principle, a number of different query-dependent sampling methods exist, but they work only for particular queries. In this paper, we show how to perform query-aware sampling (semantic sampling) which works in general. We present methods for analyzing any given query, choosing sampling methods judiciously, and reconciling the sampling methods required by different queries in a query set. We conclude with experiments on a highspeed data stream that demonstrate with different query sets that our method produces accurate results while decreasing the load significantly.
Keywords
data analysis; query processing; sampling methods; data stream management system; financial transaction monitoring; large scale scientific data feed analysis; load shedding; network traffic monitoring; query-aware random sampling; Aggregates; Computer crime; Feeds; Large-scale systems; Monitoring; Protection; Protocols; Real time systems; Sampling methods; Telecommunication traffic;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering Workshop, 2007 IEEE 23rd International Conference on
Conference_Location
Istanbul
Print_ISBN
978-1-4244-0832-0
Electronic_ISBN
978-1-4244-0832-0
Type
conf
DOI
10.1109/ICDEW.2007.4401053
Filename
4401053
Link To Document