• DocumentCode
    507486
  • Title

    Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia

  • Author

    Huynh, Dat T. ; Cao, Tru H. ; Pham, Phuong H T ; Hoang, Toan N.

  • Author_Institution
    Fac. of Comput. Sci. & Eng., Ho Chi Minh City Univ. of Technol., Ho Chi Minh City, Vietnam
  • fYear
    2009
  • fDate
    13-17 Oct. 2009
  • Firstpage
    249
  • Lastpage
    254
  • Abstract
    This paper presents a method to identify the topics of documents based on Wikipedia category network. It is to improve the method previously proposed by Schonhofen by taking into account the weights of words in hyperlink texts in Wikipedia articles. The experiments on computing and team sport domains have been carried out and showed that our proposed method outperforms the Schonhofen´s one.
  • Keywords
    information retrieval; text analysis; Wikipedia articles; Wikipedia category network; computing domains; document topic identification; hyperlink texts; team sport domains; Computer science; Crawlers; Humans; Knowledge engineering; Machine learning; Machine learning algorithms; Ontologies; Systems engineering and theory; Web sites; Wikipedia; Document topic identification; Wikipedia category network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4244-5086-2
  • Electronic_ISBN
    978-0-7695-3846-4
  • Type

    conf

  • DOI
    10.1109/KSE.2009.20
  • Filename
    5361697