DocumentCode
1666327
Title
Effects of Word Assignment in LDA for News Topic Discovery
Author
Chuen-Min Huang ; Cheng-Yi Wu
Author_Institution
Dept. of Inf. Manage., Nat. Yunlin Univ. of Sci. & Technol., Yunlin, Taiwan
fYear
2015
Firstpage
374
Lastpage
380
Abstract
In traditional LDA, latent variables are inferred from the "bag-of-words" assumption, in which word order is ignored. This bag-of-words assumption has gained recognition in terms of computational efficiency, whereas it is regarded impractical in many language model applications where word order is essential. In this study, we proposed word concatenation based on morphological rules as compounds and built the connection between compounds and topics. We used three categories including politics, economics, and life of Yahoo! Taiwan news from May/23/2013 to June/20/2013 and also extracted 1/3 of the news pool at random from each category as the mixed dataset. We compared unigrams and compounds in terms of topic coherence and performance, the result shows that the proposed model has a higher value of perplexity, while it illustrates more accurate meaning and computational efficiency than traditional LDA.
Keywords
information resources; natural language processing; text analysis; LDA; Yahoo! Taiwan news; bag-of-words; compounds; computational efficiency; economics; language model applications; latent Dirichlet allocation; latent variables; mixed dataset; morphological rules; news pool; news topic discovery; politics; topic coherence; unigrams; word assignment; word concatenation; word order; Analytical models; Coherence; Compounds; Computational efficiency; Computational modeling; Context; Data models; LDA; compounds-based; topic discovery; unigram;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location
New York, NY
Print_ISBN
978-1-4673-7277-0
Type
conf
DOI
10.1109/BigDataCongress.2015.62
Filename
7207246
Link To Document