• DocumentCode
    2346803
  • Title

    Document Topic Extraction Based on Wikipedia Category

  • Author

    Yun, Jiali ; Jing, Liping ; Yu, Jian ; Huang, Houkuan ; Zhang, Ying

  • Author_Institution
    Sch. of Comput. & Inf. Technol., Beijing Jiaotong Univ., Beijing, China
  • fYear
    2011
  • fDate
    15-19 April 2011
  • Firstpage
    852
  • Lastpage
    856
  • Abstract
    Document Topic Extraction aims at using several key phrases to describe the topics of documents. It can be applied in web document categorization and tagging, document clusters topic description and information retrieval tasks. In this paper, we propose a Wikipedia category-based document topic extraction method. Document is mapped to a set of Wikipedia categories and is represented as graph structure in order to conserve the relationship between Wikipedia categories. Then, document topic can be extracted by clustering the related Wikipedia categories in the document collection. Experiment in real data shows Wikipedia category-based document topic extraction method achieves the better result than latent topic modeling method, such as LDA.
  • Keywords
    Web sites; document handling; information retrieval; pattern clustering; Web document categorization; Wikipedia category based document topic extraction method; document clustering; document handling; document topic extraction; information retrieval task; tagging; Data mining; Electronic publishing; Encyclopedias; Internet; Semantics; Sports equipment; Document Representation; Semantic Relatedness; Topic Extraction; Wikipedia Category;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Sciences and Optimization (CSO), 2011 Fourth International Joint Conference on
  • Conference_Location
    Yunnan
  • Print_ISBN
    978-1-4244-9712-6
  • Electronic_ISBN
    978-0-7695-4335-2
  • Type

    conf

  • DOI
    10.1109/CSO.2011.119
  • Filename
    5957791