• DocumentCode
    476196
  • Title

    A topic-based Document Correlation Model

  • Author

    Jia, Xi-ping ; Peng, Hong ; Zheng, Qi-Lun ; Jiang, Zhuo-lin ; Li, Zhao

  • Author_Institution
    Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
  • Volume
    5
  • fYear
    2008
  • fDate
    12-15 July 2008
  • Firstpage
    2487
  • Lastpage
    2491
  • Abstract
    Document correlation analysis is now a focus of study in text mining. This paper proposed a Document Correlation Model to capture the correlation between documents from topic level. The model represents the document correlation as the Optimal Matching of a bipartite graph, of which each partition is a document, each node is a topic, and each edge is the similarity between two topics. The topics of each document are retrieved by the Latent Dirichlet Allocation model and Gibbs sampling. Experiments on correlated document search show that the Document Correlation Model outperforms the Vector Space Model on two aspects: 1) it has higher average retrieval precision; 2) it needs less space to store a documentpsilas information.
  • Keywords
    data mining; information retrieval; text analysis; Gibbs sampling; bipartite graph optimal matching; document correlation analysis; document retrieval; latent Dirichlet allocation model; text mining; topic-based document correlation model; Computer science; Cybernetics; Electronic mail; Information retrieval; Linear discriminant analysis; Machine learning; Optimal matching; Space technology; Text analysis; Text mining; Topic; document correlation; document retrieval; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2008 International Conference on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-2095-7
  • Electronic_ISBN
    978-1-4244-2096-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2008.4620826
  • Filename
    4620826