• DocumentCode
    3323565
  • Title

    Approximate Clustering on Distributed Data Streams

  • Author

    Zhang, Qi ; Liu, Jinze ; Wang, Wei

  • Author_Institution
    Dept. of Comput. Sci., Univ. of North Carolina, Chapel Hill, NC
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1131
  • Lastpage
    1139
  • Abstract
    We investigate the problem of clustering on distributed data streams. In particular, we consider the k-median clustering on stream data arriving at distributed sites which communicate through a routing tree. Distributed clustering on high speed data streams is a challenging task due to limited communication capacity, storage space, and computing power at each site. In this paper, we propose a suite of algorithms for computing (1 + epsiv) -approximate k-median clustering over distributed data streams under three different topology settings: topology-oblivious, height-aware, and path-aware. Our algorithms reduce the maximum per node transmission to polylog N (opposed to Omega(N) for transmitting the raw data). We have simulated our algorithms on a distributed stream system with both real and synthetic datasets composed of millions of data. In practice, our algorithms are able to reduce the data transmission to a small fraction of the original data. Moreover, our results indicate that the algorithms are scalable with respect to the data volume, approximation factor, and the number of sites.
  • Keywords
    computational complexity; distributed databases; pattern clustering; trees (mathematics); approximate k-median clustering; distributed data streams; height-aware algorithm; path-aware algorithm; routing tree; topology-oblivious algorithm; Approximation algorithms; Clustering algorithms; Computational modeling; Computer science; Data communication; Delay; Distributed computing; Monitoring; Routing; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497522
  • Filename
    4497522