DocumentCode
172568
Title
Building an Indonesian named entity recognizer using Wikipedia and DBPedia
Author
Luthfi, Andry ; Distiawan, Bayu ; Manurung, Ruli
Author_Institution
Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
fYear
2014
fDate
20-22 Oct. 2014
Firstpage
19
Lastpage
22
Abstract
This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.
Keywords
Web sites; natural language processing; DBPedia; Indonesian NER system; Indonesian named entity recognizer; Stanford NER system; Wikipedia; cross fold validation; organization entity; person entity; place entity; precision value; recall value; Data models; Electronic publishing; Encyclopedias; Internet; Tagging; Training data; dbpedia; name entity recognition; stanford ner; wikipedia;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location
Kuching
Type
conf
DOI
10.1109/IALP.2014.6973520
Filename
6973520
Link To Document