Title :
Range-efficient computation of F0 over massive data streams
Author :
Pavan, A. ; Tirthapura, Srikanta
Author_Institution :
Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA
Abstract :
Efficient one-pass computation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F0 of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (ε, δ) approximation of F0, with the following time complexity (n is the size of the universe of the items): (1) the amortized processing time per interval is O(log1/δ log n/ε). (2) The time to answer a query for F0 is O(log1/δ). The workspace used is O(1/ε2log1/δlogn) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef Kumar and Sivakumar (2002), which requires O(1/ε5log1/δlog5n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan (2003). This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons (2004) and Considine et. al. (2004).
Keywords :
computational complexity; data handling; database management systems; query processing; randomised algorithms; data stream processing; query processing; randomized algorithm; range-efficient F0 computation; time complexity; Aggregates; Algorithm design and analysis; Approximation algorithms; Computer networks; Databases; Frequency estimation; Query processing;
Conference_Titel :
Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
Print_ISBN :
0-7695-2285-8
DOI :
10.1109/ICDE.2005.118