• DocumentCode
    3125705
  • Title

    BibClus: A Clustering Algorithm of Bibliographic Networks by Message Passing on Center Linkage Structure

  • Author

    Xu, Xiaoran ; Deng, Zhi-Hong

  • Author_Institution
    Key Lab. of Machine Perception (Minist. of Educ.), Peking Univ., Beijing, China
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    864
  • Lastpage
    873
  • Abstract
    Multi-type objects with multi-type relations are ubiquitous in real-world networks, e.g. bibliographic networks. Such networks are also called heterogeneous information networks. However, the research on clustering for heterogeneous information networks is little. A new algorithm, called NetClus, has been proposed in recent two years. Although NetClus is applied on a heterogeneous information network with a star network schema, considering the relations between center objects and all attribute objects linking to them, it ignores the relations between center objects such as citation relations, which also contain rich information. Hence, we think the star network schema cannot be used to characterize all possible relations without integrating the linkage structure among center objects, which we call the Center Linkage Structure, and there has been no practical way good enough to solve it. In this paper, we present a novel algorithm, BibClus, for clustering heterogeneous objects with center linkage structure by taking a bibliographic information network as an example. In BibClus, we build a probabilistic model of pair wise hidden Markov random field (P-HMRF) to characterize the center linkage structure, and convert it to a factor graph. We further combine EM algorithm with factor graph theory, and design an efficient way based on message passing algorithm to inference marginal probabilities and estimate parameters at each iteration of EM. We also study how factor functions affect clustering performance with different function forms and constraints. For evaluating our proposed method, we have conducted thorough experiments on a real dataset that we had crawled from ACM Digital Library. The experimental results show that BibClus is effective and has a much higher quantity than the recently proposed algorithm, NetClus, in both recall and precision.
  • Keywords
    Markov processes; bibliographic systems; digital libraries; graph theory; inference mechanisms; iterative methods; message passing; pattern clustering; ACM digital library; BibClus; EM iteration; NetClus; bibliographic networks; center linkage structure; clustering algorithm; factor graph theory; heterogeneous information networks; inference marginal probabilities; message passing algorithm; multitype objects; multitype relations; pairwise hidden Markov random field; star network schema; Approximation algorithms; Clustering algorithms; Couplings; Hidden Markov models; Joints; Message passing; Probability; clustering; factor graph; heterogeneous information network; message passing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.27
  • Filename
    6137291