DocumentCode
3323565
Title
Approximate Clustering on Distributed Data Streams
Author
Zhang, Qi ; Liu, Jinze ; Wang, Wei
Author_Institution
Dept. of Comput. Sci., Univ. of North Carolina, Chapel Hill, NC
fYear
2008
fDate
7-12 April 2008
Firstpage
1131
Lastpage
1139
Abstract
We investigate the problem of clustering on distributed data streams. In particular, we consider the k-median clustering on stream data arriving at distributed sites which communicate through a routing tree. Distributed clustering on high speed data streams is a challenging task due to limited communication capacity, storage space, and computing power at each site. In this paper, we propose a suite of algorithms for computing (1 + epsiv) -approximate k-median clustering over distributed data streams under three different topology settings: topology-oblivious, height-aware, and path-aware. Our algorithms reduce the maximum per node transmission to polylog N (opposed to Omega(N) for transmitting the raw data). We have simulated our algorithms on a distributed stream system with both real and synthetic datasets composed of millions of data. In practice, our algorithms are able to reduce the data transmission to a small fraction of the original data. Moreover, our results indicate that the algorithms are scalable with respect to the data volume, approximation factor, and the number of sites.
Keywords
computational complexity; distributed databases; pattern clustering; trees (mathematics); approximate k-median clustering; distributed data streams; height-aware algorithm; path-aware algorithm; routing tree; topology-oblivious algorithm; Approximation algorithms; Clustering algorithms; Computational modeling; Computer science; Data communication; Delay; Distributed computing; Monitoring; Routing; Topology;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location
Cancun
Print_ISBN
978-1-4244-1836-7
Electronic_ISBN
978-1-4244-1837-4
Type
conf
DOI
10.1109/ICDE.2008.4497522
Filename
4497522
Link To Document