DocumentCode :
2772429
Title :
Knowledge Discovery from Citation Networks
Author :
Guo, Zhen ; Zhang, Zhongfei Mark ; Zhu, Shenghuo ; Chi, Yun ; Gong, Yihong
Author_Institution :
Comput. Sci. Dept., SUNY at Binghamton, Binghamton, NY, USA
fYear :
2009
fDate :
6-9 Dec. 2009
Firstpage :
800
Lastpage :
805
Abstract :
Knowledge discovery from scientific articles has received increasing attentions recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper we propose a Bernoulli Process Topic (BPT) model which models the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multilevel hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. In addition to conducting the experimental evaluations on the document modeling task, we also apply the BPT model to a well known scientific corpus to discover the latent topics. The comparisons against state-of-the-art methods demonstrate a very promising performance.
Keywords :
Internet; approximation theory; data mining; Bernoulli process topic; Internet; citation networks; digital databases; knowledge discovery; variational approximation; Computer science; Data mining; Databases; Graphical models; IP networks; Laboratories; Linear discriminant analysis; National electric code; Software libraries; Text mining; Unsupervised learning; latent models; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location :
Miami, FL
ISSN :
1550-4786
Print_ISBN :
978-1-4244-5242-2
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2009.137
Filename :
5360314
Link To Document :
بازگشت