Title :
RankTopic: Ranking Based Topic Modeling
Author :
Dongsheng Duan ; Yuhua Li ; Ruixuan Li ; Rui Zhang ; Aiming Wen
Author_Institution :
Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
Topic modeling has become a widely used tool for document management due to its superior performance. However, there are few topic models distinguishing the importance of documents on different topics. In this paper, we investigate how to utilize the importance of documents to improve topic modeling and propose to incorporate link based ranking into topic modeling. Specifically, topical pagerank is used to compute the topic level ranking of documents, which indicates the importance of documents on different topics. By retreating the topical ranking of a document as the probability of the document involved in corresponding topic, a generalized relation is built between ranking and topic modeling. Based on the relation, a ranking based topic model Rank Topic is proposed. With Rank Topic, a mutual enhancement framework is established between ranking and topic modeling. Extensive experiments on paper citation data and Twitter data are conducted to compare the performance of Rank Topic with that of some state-of-the-art topic models. Experimental results show that Rank Topic performs much better than some baseline models and is comparable with the state-of-the-art link combined relational topic model (RTM) in generalization performance, document clustering and classification by setting a proper balancing parameter. It is also demonstrated in both quantitative and qualitative ways that topics detected by Rank Topic are more interpretable than those detected by some baseline models and still competitive with RTM.
Keywords :
document handling; generalisation (artificial intelligence); pattern classification; pattern clustering; probability; RankTopic tool; Twitter data; balancing parameter; document classification; document clustering; document importance; document management; document ranking; generalization performance; generalized relation; mutual enhancement framework; paper citation data; probability; ranking based topic modeling; relational topic model; topical pagerank; Computational modeling; Data models; Educational institutions; Equations; Mathematical model; Noise; Web pages; Classification; Clustering; Document Network; Ranking; Topic Modeling;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.12