• DocumentCode
    3229358
  • Title

    Mining Domain-Specific Thesauri from Wikipedia: A Case Study

  • Author

    Milne, David ; Medelyan, Olena ; Witten, Ian H.

  • Author_Institution
    Dept. of Comput. Sci., Waikato Univ., Hamilton
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    442
  • Lastpage
    448
  • Abstract
    Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts
  • Keywords
    Web sites; agriculture; data mining; document handling; thesauri; Wikipedia; agriculture; domain-specific thesauri mining; knowledge structures; Agriculture; Art; Computer science; Content based retrieval; Information retrieval; Investments; Manuals; Natural languages; Thesauri; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.119
  • Filename
    4061409