DocumentCode :
3134692
Title :
Finding frequent items in sliding windows with multinomially-distributed item frequencies
Author :
Golab, Lukasz ; DeHaan, David ; Lopez-Ortiz, Alejandro ; Demaine, Erik D.
Author_Institution :
Waterloo Univ., Canada
fYear :
2004
fDate :
21-23 June 2004
Firstpage :
425
Lastpage :
426
Abstract :
In this paper, we present an algorithm for identifying frequently occurring items within a sliding window of the last N items seen over an infinite data stream, given the following constraints: (1) the relative frequencies of the item types can vary over the lifetime of the stream, provided that they vary sufficiently slowly that for any sliding window of N tuples, with high probability the window could have been generated by a multinomial distribution. We refer to this as the drifting distribution model in the full version of this paper (Golab et al., 2004). (2) The entire sliding window does not fit in the available memory (otherwise, we could simply count all the distinct item types and return those whose frequencies exceed some threshold). (3) The stream may arrive at a high rate, so only a constant number of operations (amortized) is allowed for the processing of each item.
Keywords :
data mining; database management systems; pattern recognition; probability; data stream; drifting distribution; frequently occurring item identification; multinomial distribution; multinomially-distributed item frequencies; probability; sliding windows; Conference management; Councils; Counting circuits; Databases; Delay; Frequency conversion; Identity management systems; Partitioning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
ISSN :
1099-3371
Print_ISBN :
0-7695-2146-0
Type :
conf
DOI :
10.1109/SSDM.2004.1311242
Filename :
1311242
Link To Document :
بازگشت