DocumentCode :
2734894
Title :
Clustering data streams
Author :
Guha, Sudipto ; Mishra, Nina ; Motwani, Rajeev ; O´Callaghan, Liadan
Author_Institution :
Dept. of Comput. Sci., Stanford Univ., CA, USA
fYear :
2000
fDate :
2000
Firstpage :
359
Lastpage :
366
Abstract :
We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as Web click stream analysis and multimedia data analysis. We give constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass. We also show negative results implying that our algorithms cannot be improved in a certain sense
Keywords :
computational complexity; data analysis; deterministic algorithms; pattern clustering; very large databases; Web click stream analysis; constant-factor approximation algorithms; data stream clustering; data stream model; deterministic algorithms; k-median problem; massive data sets; multimedia data analysis; point sequence; Application software; Approximation algorithms; Clustering algorithms; Computational modeling; Computer science; Data analysis; Laboratories; Streaming media; Telephony; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on
Conference_Location :
Redondo Beach, CA
ISSN :
0272-5428
Print_ISBN :
0-7695-0850-2
Type :
conf
DOI :
10.1109/SFCS.2000.892124
Filename :
892124
Link To Document :
بازگشت