• DocumentCode
    3439360
  • Title

    Pattern-Based Topic Models for Information Filtering

  • Author

    Gao, Yuan ; Xu, Yan ; Li, Yuhua

  • Author_Institution
    Fac. of Sci. & Eng., QUT, Brisbane, QLD, Australia
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    921
  • Lastpage
    928
  • Abstract
    Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model (PBTM), is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.
  • Keywords
    information filtering; pattern classification; text analysis; LDA; PBTM; TREC data collection Reuters Corpus Volume 1; document relevance ranking; document representation; information filtering model; latent Dirichlet allocation; pattern-based topic models; semantic pattern representations; statistical models; text documents; topic distributions; Data mining; Data models; Itemsets; Mathematical model; Semantics; Taxonomy; Training; Topic models; closed pattern; information filtering; pattern mining; user modelling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • Print_ISBN
    978-1-4799-3143-9
  • Type

    conf

  • DOI
    10.1109/ICDMW.2013.30
  • Filename
    6754020