DocumentCode
3722418
Title
An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article
Author
Hanif Bhuiyan;Kyeong-Jin Oh;Myung-Duk Hong;Geun-Sik Jo
Author_Institution
Dept. of Comput. Sci. &
fYear
2015
Firstpage
334
Lastpage
338
Abstract
Wikipedia infoboxes serve as important structured information source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the Wikipedia contributors. In this paper, we present a Natural Language Processing (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word features of Wikipedia articles. Our approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.
Keywords
"Encyclopedias","Electronic publishing","Internet","Semantics","Manuals","Text categorization"
Publisher
ieee
Conference_Titel
Computational Science and Engineering (CSE), 2015 IEEE 18th International Conference on
Type
conf
DOI
10.1109/CSE.2015.47
Filename
7371393
Link To Document