Title :
Building an Indonesian rule-based part-of-speech tagger
Author :
Rashel, Fam ; Luthfi, Andry ; Dinakaramani, Arawinda ; Manurung, Ruli
Author_Institution :
Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
Abstract :
This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.
Keywords :
knowledge based systems; natural language processing; Indonesian language; Indonesian rule-based part-of-speech tagger; closed-class words; multiword expression; named entity recognition; open-class words; rule-based approach; Accuracy; Buildings; Dictionaries; Natural language processing; Probabilistic logic; Speech; Tagging; disambiguation rule; part of speech tag; token;
Conference_Titel :
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location :
Kuching
DOI :
10.1109/IALP.2014.6973521