DocumentCode :
3599773
Title :
Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis
Author :
Siqi Huang ; Yitao Yang ; Huakang Li ; Guozi Sun
Author_Institution :
Nanjing Univ. of Posts & Telecommun., Nanjing, China
fYear :
2014
Firstpage :
88
Lastpage :
92
Abstract :
This paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in handling sparse Micro blog short texts. In consequence of the structural data of the Microblog, which exists rich inter-textual contextual information such as retweets, comments, user hash tag, embedded link URL, we first put forward a feature weight pre-processing method. We also use a clustering algorithm based on word vectors to enrich the feature information of the data. On this basis, we extend the conventional LDA (Latent Dirichlet allocation) topic model to extract the hot topics in the Micro blog data. Compared with the traditional methods, the method raised in this paper is much more effective in the collected text corpus in Sina Microblog.
Keywords :
Web sites; pattern clustering; text analysis; LDA; Microblog topic detection method; Sina Microblog; data structure; feature information; feature weight preprocessing method; intertextual contextual information; latent dirichlet allocation; sparse Microblog short texts; text clustering; text media; topic detection method; topic model analysis; word vectors; Analytical models; Clustering algorithms; Data mining; Data models; Mathematical model; Semantics; Twitter; LDA; Microblog; text clustering; topic detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services Computing Conference (APSCC), 2014 Asia-Pacific
Type :
conf
DOI :
10.1109/APSCC.2014.18
Filename :
7175500
Link To Document :
بازگشت