DocumentCode
52610
Title
BTM: Topic Modeling over Short Texts
Author
Xueqi Cheng ; Xiaohui Yan ; Yanyan Lan ; Jiafeng Guo
Author_Institution
Inst. of Comput. Technol., Beijing, China
Volume
26
Issue
12
fYear
2014
fDate
Dec. 1 2014
Firstpage
2928
Lastpage
2941
Abstract
Short texts are popular on today´s web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.
Keywords
content management; inference mechanisms; social networking (online); text analysis; word processing; biterm topic model; content analysis; corpus level information; inference mechanism; large scale short text data collection; online BTM algorithms; short text topic modeling; social media; time efficiency; topic learning; word co-occurrence patterns; Algorithm design and analysis; Analytical models; Context modeling; Data models; Inference algorithms; Semantics; Time complexity; Short text; biterm; content analysis; online algorithm; topic model;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2014.2313872
Filename
6778764
Link To Document