DocumentCode :
1951825
Title :
A comparative study of topic models for topic clustering of Chinese web news
Author :
Wu, Yonghui ; Ding, Yuxin ; Wang, Xiaolong ; Xu, Jun
Author_Institution :
Harbin Inst. of Technol., Harbin, China
Volume :
5
fYear :
2010
fDate :
9-11 July 2010
Firstpage :
236
Lastpage :
240
Abstract :
Topic model is an increasing useful tool to analyze the semantic level meanings and capture the topical features. However, there is few research about the comparative study of the topic models. In this paper, we describe our comparative study of three topic models in the extrinsic application of topic clustering. The topic model distance is defined on the converged parameters of topic models, which is used in the topic clustering. Then, the topic models are compared using the clustering result of the corresponding topic distance matrix. A series of comparative experiments are carried on a corpus containing 5033 web news from 30 topics using the cosine distance as the base-line. Web page collections with different number of topics and documents are used in experiments. The experiment results show that topic clustering using topic distance achieves a better precision and recall in the data set containing related topics. The topic clustering using topic distance benefits from the topic features captured by topic models. The complex topic model does provide further help than the simple topic model in topic clustering.
Keywords :
Internet; information retrieval; pattern clustering; search engines; Chinese Web news; Web page collections; cosine distance; data set; semantic level meanings; topic clustering; topic distance matrix; topic features; topic models; Artificial neural networks; clustering; comparative study; distance measure; topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-5537-9
Type :
conf
DOI :
10.1109/ICCSIT.2010.5564723
Filename :
5564723
Link To Document :
بازگشت