Title :
A Graph-Based Bursty Topic Detection Approach in User-Generated Texts
Author :
Li Zhao ; Yan Li ; Xinran Liu ; Hong Zhang
Author_Institution :
Coordination Center of China, Nat. Comput. Network Emergency Response Tech. Team, Beijing, China
Abstract :
The problem of hot bursty topic detection in user generated texts deserves great attentions with the proliferation of Internet technologies. However, traditional document clustering and probabilistic topic models that were developed for formal news articles are less effective for informal user-generated corpora. In this paper, we provide a graph-based perspective that well reflects the latent pattern of bursty topics in text stream and develop an effective solution of the bursty topic detection problem. We represent texts with topics using a directed and weighted graph, with the bursty words as vertices and Tversky index of bursty words being edges. Topic detection from the texts is then converted into dividing the constructed graph into separate sub graphs, each significant sub graph corresponding to a bursty topic. To accomplish this, we partition the bursty word graph into the graph´s strongly connected components, based on the analysis that the important topical words within a graph are connected to each other with high weights and thus form strongly connected components. We demonstrate through experiments on two user-generated corpora collected from English web log and Chinese weibo (microblog) sites that the proposed approach can effectively detects the hot bursty topics, more appropriate than other topic detection models such as the LDA topic model and the EGF approach in TDT project.
Keywords :
Internet; directed graphs; indexing; natural language processing; social networking (online); text analysis; Chinese Weibo sites; EGF approach; English Weblog; Internet technologies proliferation; LDA topic model; LiveJournal Blog; Sina Weibo; TDT project; Tencent WeChat; Tversky index; bursty words; directed graph; graph-based bursty topic detection approach; latent bursty topic pattern; microblog sites; text stream; user-genterated texts; weighted graph; Blogs; Feature extraction; Image color analysis; Image edge detection; Indexes; Nominations and elections; Probabilistic logic; Bursty Topic detection; Graph Theory; User-Generated Texts;
Conference_Titel :
Web Information System and Application Conference (WISA), 2014 11th
Print_ISBN :
978-1-4799-5726-2
DOI :
10.1109/WISA.2014.57