Mining Domain-Specific Thesauri from Wikipedia: A Case Study

Author

Milne, David ; Medelyan, Olena ; Witten, Ian H.

Author_Institution

Dept. of Comput. Sci., Waikato Univ., Hamilton

fYear

2006

fDate

18-22 Dec. 2006

Firstpage

442

Lastpage

448

Abstract

Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts

Keywords

Web sites; agriculture; data mining; document handling; thesauri; Wikipedia; agriculture; domain-specific thesauri mining; knowledge structures; Agriculture; Art; Computer science; Content based retrieval; Information retrieval; Investments; Manuals; Natural languages; Thesauri; Wikipedia;

fLanguage

English

Publisher

ieee

Conference_Titel

Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on

Conference_Location

Hong Kong

Print_ISBN

0-7695-2747-7

Type

conf

DOI

10.1109/WI.2006.119

Filename

4061409

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3229358