DocumentCode :
172569
Title :
Building an Indonesian rule-based part-of-speech tagger
Author :
Rashel, Fam ; Luthfi, Andry ; Dinakaramani, Arawinda ; Manurung, Ruli
Author_Institution :
Fac. of Comput. Sci., Univ. Indonesia, Depok, Indonesia
fYear :
2014
fDate :
20-22 Oct. 2014
Firstpage :
70
Lastpage :
73
Abstract :
This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.
Keywords :
knowledge based systems; natural language processing; Indonesian language; Indonesian rule-based part-of-speech tagger; closed-class words; multiword expression; named entity recognition; open-class words; rule-based approach; Accuracy; Buildings; Dictionaries; Natural language processing; Probabilistic logic; Speech; Tagging; disambiguation rule; part of speech tag; token;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location :
Kuching
Type :
conf
DOI :
10.1109/IALP.2014.6973521
Filename :
6973521
Link To Document :
بازگشت