Title :
Building an Indonesian named entity recognizer using Wikipedia and DBPedia
Author :
Luthfi, Andry ; Distiawan, Bayu ; Manurung, Ruli
Author_Institution :
Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
Abstract :
This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.
Keywords :
Web sites; natural language processing; DBPedia; Indonesian NER system; Indonesian named entity recognizer; Stanford NER system; Wikipedia; cross fold validation; organization entity; person entity; place entity; precision value; recall value; Data models; Electronic publishing; Encyclopedias; Internet; Tagging; Training data; dbpedia; name entity recognition; stanford ner; wikipedia;
Conference_Titel :
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location :
Kuching
DOI :
10.1109/IALP.2014.6973520