DocumentCode :
1196899
Title :
Clustering data streams: Theory and practice
Author :
Guha, Sudipto ; Meyerson, Adam ; Mishra, Nina ; Motwani, Rajeev ; O´Callaghan, Liadan
Author_Institution :
Dept. of Comput. Sci., Pennsylvania Univ., Philadelphia, PA, USA
Volume :
15
Issue :
3
fYear :
2003
Firstpage :
515
Lastpage :
528
Abstract :
The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm´s performance on synthetic and real data streams.
Keywords :
data mining; facility location; learning (artificial intelligence); Web documents; approximation algorithms; clickstreams; data streams clustering; empirical evidence; real data streams; telephone records; Algorithm design and analysis; Approximation algorithms; Clustering algorithms; Data analysis; Meteorology; Partitioning algorithms; Statistics; Streaming media; Telephony; Web pages;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2003.1198387
Filename :
1198387
Link To Document :
بازگشت