DocumentCode
2708790
Title
On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking
Author
AlSumait, Loulwah ; Barbara, Daniel ; Domeniconi, Carlotta
Author_Institution
Dept. of Comput. Sci., George Mason Univ., Fairfax, VA
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
3
Lastpage
12
Abstract
This paper presents online topic model (OLDA), a topic model that automatically captures the thematic patterns and identifies emerging topics of text streams and their changes over time. Our approach allows the topic modeling framework, specifically the latent Dirichlet allocation (LDA) model, to work in an online fashion such that it incrementally builds an up-to-date model (mixture of topics per document and mixture of words per topic) when a new document (or a set of documents) appears. A solution based on the empirical Bayes method is proposed. The idea is to incrementally update the current model according to the information inferred from the new stream of data with no need to access previous data. The dynamics of the proposed approach also provide an efficient mean to track the topics over time and detect the emerging topics in real time. Our method is evaluated both qualitatively and quantitatively using benchmark datasets. In our experiments, the OLDA has discovered interesting patterns by just analyzing a fraction of data at a time. Our tests also prove the ability of OLDA to align the topics across the epochs with which the evolution of the topics over time is captured. The OLDA is also comparable to, and sometimes better than, the original LDA in predicting the likelihood of unseen documents.
Keywords
Bayes methods; data mining; text analysis; adaptive topic model; empirical Bayes method; latent Dirichlet allocation; online LDA; pattern discovery; text stream mining; topic detection; topic tracking; Application software; Benchmark testing; Computer science; Data mining; Linear discriminant analysis; Organizing; Pattern analysis; Software libraries; USA Councils; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location
Pisa
ISSN
1550-4786
Print_ISBN
978-0-7695-3502-9
Type
conf
DOI
10.1109/ICDM.2008.140
Filename
4781095
Link To Document