Title :
Probabilistic Document Correlation Model
Author :
Jia, Xiping ; Peng, Hong
Author_Institution :
South China Univ. of Technol., Guangzhou
Abstract :
Vector space model (VSM) and related models are popular in document relationship analysis in text mining recently. However, they are failed to discover the document correlation from topic level. This paper proposes a probabilistic document correlation model (PDC) to capture the document correlation based on topics. The PDC model defines the document correlation by the posterior probability of documents. And the posterior probability of each document is resolved through introducing the posterior probability of topics and topic similarity. Latent Dirichlet allocation (LDA), a generative topic model, is used for topic retrieval in this paper. Experiments on correlated document search show that the PDC model outperforms the VSM in average retrieval precision and document compressing.
Keywords :
information retrieval; probability; text analysis; vectors; latent Dirichlet allocation; posterior probability of documents; probabilistic document correlation model; text mining; topic retrieval; topic similarity; vector space model; Bipartite graph; Computational intelligence; Computer science; Computer security; Functional analysis; Optimal matching; Space technology; Text analysis; Text mining; Vocabulary;
Conference_Titel :
Computational Intelligence and Security Workshops, 2007. CISW 2007. International Conference on
Conference_Location :
Heilongjiang
Print_ISBN :
978-0-7695-3073-4
DOI :
10.1109/CISW.2007.4425527