DocumentCode
35886
Title
Unsupervised Web Topic Detection Using A Ranked Clustering-Like Pattern Across Similarity Cascades
Author
Junbiao Pang ; Fei Jia ; Chunjie Zhang ; Weigang Zhang ; Qingming Huang ; Baocai Yin
Author_Institution
Beijing Key Lab. of Multimedia & Intell. Software Technol., Beijing Univ. of Technol., Beijing, China
Volume
17
Issue
6
fYear
2015
fDate
Jun-15
Firstpage
843
Lastpage
853
Abstract
Despite the massive growth of social media on the Internet, the process of organizing, understanding, and monitoring user generated content (UGC) has become one of the most pressing problems in today´s society. Discovering topics on the web from a huge volume of UGC is one of the promising approaches to achieve this goal. Compared with classical topic detection and tracking in news articles, identifying topics on the web is by no means easy due to the noisy, sparse, and less- constrained data on the Internet. In this paper, we investigate methods from the perspective of similarity diffusion, and propose a clustering-like pattern across similarity cascades (SCs). SCs are a series of subgraphs generated by truncating a similarity graph with a set of thresholds, and then maximal cliques are used to capture topics. Finally, a topic-restricted similarity diffusion process is proposed to efficiently identify real topics from a large number of candidates. Experiments demonstrate that our approach outperforms the state-of-the-art methods on three public data sets.
Keywords
Internet; information retrieval; network theory (graphs); pattern clustering; social networking (online); Internet; UGC; ranked clustering-like pattern across similarity cascade; similarity graph; social media; topic restricted similarity diffusion process; unsupervised Web topic detection; user generated content; Accidents; Clustering algorithms; Media; Noise measurement; Organizing; Semantics; Visualization; Maximal clique; Poisson deconvolution; similarity cascade (SC); unsupervised ranking; web topic detection;
fLanguage
English
Journal_Title
Multimedia, IEEE Transactions on
Publisher
ieee
ISSN
1520-9210
Type
jour
DOI
10.1109/TMM.2015.2425143
Filename
7091017
Link To Document