• DocumentCode
    35886
  • Title

    Unsupervised Web Topic Detection Using A Ranked Clustering-Like Pattern Across Similarity Cascades

  • Author

    Junbiao Pang ; Fei Jia ; Chunjie Zhang ; Weigang Zhang ; Qingming Huang ; Baocai Yin

  • Author_Institution
    Beijing Key Lab. of Multimedia & Intell. Software Technol., Beijing Univ. of Technol., Beijing, China
  • Volume
    17
  • Issue
    6
  • fYear
    2015
  • fDate
    Jun-15
  • Firstpage
    843
  • Lastpage
    853
  • Abstract
    Despite the massive growth of social media on the Internet, the process of organizing, understanding, and monitoring user generated content (UGC) has become one of the most pressing problems in today´s society. Discovering topics on the web from a huge volume of UGC is one of the promising approaches to achieve this goal. Compared with classical topic detection and tracking in news articles, identifying topics on the web is by no means easy due to the noisy, sparse, and less- constrained data on the Internet. In this paper, we investigate methods from the perspective of similarity diffusion, and propose a clustering-like pattern across similarity cascades (SCs). SCs are a series of subgraphs generated by truncating a similarity graph with a set of thresholds, and then maximal cliques are used to capture topics. Finally, a topic-restricted similarity diffusion process is proposed to efficiently identify real topics from a large number of candidates. Experiments demonstrate that our approach outperforms the state-of-the-art methods on three public data sets.
  • Keywords
    Internet; information retrieval; network theory (graphs); pattern clustering; social networking (online); Internet; UGC; ranked clustering-like pattern across similarity cascade; similarity graph; social media; topic restricted similarity diffusion process; unsupervised Web topic detection; user generated content; Accidents; Clustering algorithms; Media; Noise measurement; Organizing; Semantics; Visualization; Maximal clique; Poisson deconvolution; similarity cascade (SC); unsupervised ranking; web topic detection;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2015.2425143
  • Filename
    7091017