• DocumentCode
    51145
  • Title

    Top-k Similarity Join in Heterogeneous Information Networks

  • Author

    Yun Xiong ; Yangyong Zhu ; Yu, Philip S.

  • Author_Institution
    Shanghai Key Lab. of Data Sci., Fudan Univ., Shanghai, China
  • Volume
    27
  • Issue
    6
  • fYear
    2015
  • fDate
    June 1 2015
  • Firstpage
    1710
  • Lastpage
    1723
  • Abstract
    As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.
  • Keywords
    data mining; database indexing; graph theory; information networks; BPLSH indexing; HIN; PS-join method; bucket pruning based locality sensitive hashing indexing; classification; clustering; data mining tasks; heterogeneous information networks; path-based similarity join method; similarity search; top-k similarity join; Data engineering; Data mining; Indexing; Knowledge engineering; Search problems; Semantics; Vectors; Heterogeneous network; graph; heterogeneous network; similarity join;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2373385
  • Filename
    6963491