DocumentCode :
3722418
Title :
An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article
Author :
Hanif Bhuiyan;Kyeong-Jin Oh;Myung-Duk Hong;Geun-Sik Jo
Author_Institution :
Dept. of Comput. Sci. &
fYear :
2015
Firstpage :
334
Lastpage :
338
Abstract :
Wikipedia infoboxes serve as important structured information source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the Wikipedia contributors. In this paper, we present a Natural Language Processing (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word features of Wikipedia articles. Our approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.
Keywords :
"Encyclopedias","Electronic publishing","Internet","Semantics","Manuals","Text categorization"
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering (CSE), 2015 IEEE 18th International Conference on
Type :
conf
DOI :
10.1109/CSE.2015.47
Filename :
7371393
Link To Document :
بازگشت