Title :
The development of a fine grained class set for Amazigh POS tagging
Author :
Outahajala, Mohamed ; Benajiba, Yassine ; Zenkouar, Lahbib ; Rosso, Paolo
Author_Institution :
Lab. Electron. et Commun., Univ. Mohammed V Agdal, Morocco
Abstract :
Like most of the languages which have only recently started being investigated for the Natural Language Processing (NLP) tasks, Amazigh lacks annotated corpora and tools and still suffers from the scarcity of linguistic tools and resources. The main aim of this paper is to present a tokenizer tool and a new part-of-speech (POS) tagger based on a new Amazigh tag set (AMTS) composed of 28 tag. In line with our goal we have trained two sequence classification models using Support Vector Machines (SVMs) and Conditional Random Fields (CRFs) to build a toknizer and a POS tagger for the Amazigh language. We have used the 10-fold technique to evaluate and validate our approach. We report that POS tagging results using SVMs and CRFs are very comparable. Across the board, CRFs outperformed SVMs on the fold level (91.18% vs. 90.75%) and CRFs outperformed SVMs on the 10 folds average level (87.95% vs. 87.11%). Regarding tokenization task, SVMs outperformed CRFs on the fold level (99.97% vs. 99.85%) and on the 10 folds average level (99.95% vs. 99.89%).
Keywords :
natural language processing; pattern classification; random processes; support vector machines; AMTS; Amazigh POS tagging; Amazigh language; Amazigh tag set; CRFs; NLP tasks; POS tagger; SVMs; annotated corpora; conditional random fields; fine grained class set; linguistic tools; natural language processing task; part-of-speech tagger; sequence classification models; support vector machines; tokenizer tool; toknizer; Dictionaries; Hidden Markov models; Manuals; Pragmatics; Tagging; Training; Annotation process; CRFs; POS tagging; SVMs; Tokenization; supervised learning;
Conference_Titel :
Computer Systems and Applications (AICCSA), 2013 ACS International Conference on
Conference_Location :
Ifrane
DOI :
10.1109/AICCSA.2013.6616440