• DocumentCode
    23455
  • Title

    Distributed Information Theoretic Clustering

  • Author

    Pengcheng Shen ; Chunguang Li

  • Author_Institution
    Dept. of Inf. Sci. & Electron. Eng., Zhejiang Univ., Hangzhou, China
  • Volume
    62
  • Issue
    13
  • fYear
    2014
  • fDate
    1-Jul-14
  • Firstpage
    3442
  • Lastpage
    3453
  • Abstract
    Distributed data collection and analysis over networks are ubiquitous, especially over the wireless sensor networks (WSNs). Distributed clustering is one of the most important topics in distributed data analysis. It is desired to explore the hidden structure of the data collected/stored in geographically distributed nodes. In recent years, several distributed data clustering techniques have been developed based on the K-means algorithm or the Gaussian mixture model. In these methods, data structures are captured by measures only based on the first and the second order statistics. When the structure of cluster data is complicated, these statistics are insufficient and may lead to unsatisfactory clustering results. In such a case, using information theoretic measures can achieve better clustering performance since they take the whole distribution of cluster data into account. In this work, we incorporate an information theoretic measure into the cost function of the distributed clustering, to present a linear and a kernel distributed clustering algorithms. In the algorithms, each node solves a local clustering problem through diffusion cooperation with its neighboring nodes. In order to preserve privacy and save communication costs, in the cooperation, nodes merely exchange a few parameters instead of original data with their one-hop neighbors. Simulation results show that the proposed distributed algorithms can achieve almost as good clustering results as the corresponding centralized information theoretic clustering algorithms on both synthetic and real data.
  • Keywords
    data acquisition; data analysis; data structures; pattern clustering; probability; statistics; wireless sensor networks; Gaussian mixture model; K-means algorithm; WSN; centralized information theoretic clustering algorithms; cluster data; communication costs; cost function; data structures; diffusion cooperation; distributed algorithms; distributed data analysis; distributed data clustering techniques; distributed data collection; distributed information theoretic clustering; first order statistics; geographically distributed nodes; information theoretic measures; kernel distributed clustering algorithms; local clustering problem; second order statistics; wireless sensor networks; Approximation methods; Clustering algorithms; Cost function; Data models; Distributed databases; Mutual information; Signal processing algorithms; Diffusion cooperation; discriminative clustering; distributed clustering; information theory; mutual information;
  • fLanguage
    English
  • Journal_Title
    Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1053-587X
  • Type

    jour

  • DOI
    10.1109/TSP.2014.2327010
  • Filename
    6822602