DocumentCode
3229358
Title
Mining Domain-Specific Thesauri from Wikipedia: A Case Study
Author
Milne, David ; Medelyan, Olena ; Witten, Ian H.
Author_Institution
Dept. of Comput. Sci., Waikato Univ., Hamilton
fYear
2006
fDate
18-22 Dec. 2006
Firstpage
442
Lastpage
448
Abstract
Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts
Keywords
Web sites; agriculture; data mining; document handling; thesauri; Wikipedia; agriculture; domain-specific thesauri mining; knowledge structures; Agriculture; Art; Computer science; Content based retrieval; Information retrieval; Investments; Manuals; Natural languages; Thesauri; Wikipedia;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location
Hong Kong
Print_ISBN
0-7695-2747-7
Type
conf
DOI
10.1109/WI.2006.119
Filename
4061409
Link To Document