DocumentCode :
52610
Title :
BTM: Topic Modeling over Short Texts
Author :
Xueqi Cheng ; Xiaohui Yan ; Yanyan Lan ; Jiafeng Guo
Author_Institution :
Inst. of Comput. Technol., Beijing, China
Volume :
26
Issue :
12
fYear :
2014
fDate :
Dec. 1 2014
Firstpage :
2928
Lastpage :
2941
Abstract :
Short texts are popular on today´s web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.
Keywords :
content management; inference mechanisms; social networking (online); text analysis; word processing; biterm topic model; content analysis; corpus level information; inference mechanism; large scale short text data collection; online BTM algorithms; short text topic modeling; social media; time efficiency; topic learning; word co-occurrence patterns; Algorithm design and analysis; Analytical models; Context modeling; Data models; Inference algorithms; Semantics; Time complexity; Short text; biterm; content analysis; online algorithm; topic model;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2014.2313872
Filename :
6778764
Link To Document :
بازگشت