Title :
Finding frequent items in sliding windows with multinomially-distributed item frequencies
Author :
Golab, Lukasz ; DeHaan, David ; Lopez-Ortiz, Alejandro ; Demaine, Erik D.
Author_Institution :
Waterloo Univ., Canada
Abstract :
In this paper, we present an algorithm for identifying frequently occurring items within a sliding window of the last N items seen over an infinite data stream, given the following constraints: (1) the relative frequencies of the item types can vary over the lifetime of the stream, provided that they vary sufficiently slowly that for any sliding window of N tuples, with high probability the window could have been generated by a multinomial distribution. We refer to this as the drifting distribution model in the full version of this paper (Golab et al., 2004). (2) The entire sliding window does not fit in the available memory (otherwise, we could simply count all the distinct item types and return those whose frequencies exceed some threshold). (3) The stream may arrive at a high rate, so only a constant number of operations (amortized) is allowed for the processing of each item.
Keywords :
data mining; database management systems; pattern recognition; probability; data stream; drifting distribution; frequently occurring item identification; multinomial distribution; multinomially-distributed item frequencies; probability; sliding windows; Conference management; Councils; Counting circuits; Databases; Delay; Frequency conversion; Identity management systems; Partitioning algorithms;
Conference_Titel :
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
Print_ISBN :
0-7695-2146-0
DOI :
10.1109/SSDM.2004.1311242