Title of article :
Towards a standard Part of Speech tagset for the Arabic language
Author/Authors :
zeroual, imad university mohamed first - science faculty of sciences - department of mathematics and computer, Morocco , lakhouaja, abdelhak university mohamed first - science faculty of sciences - department of mathematics and computer, Morocco , belahbib, rachid doha historical dictionary of the arabic language, Qatar
From page :
171
To page :
178
Abstract :
Part of Speech (PoS) tagging is still not very well investigated with respect to the Arabic language. Determining the PoS tags of a word in a particular context is difficult, primarily because there is no use of diacritics in most of contemporary texts. Consequently, the same word may be spelled in different ways. Further, detecting the difference between Arabic derivatives represents a very challenging issue for the majority of PoS taggers. Hence, the task of tagging the correct PoS tags requires advanced processing and the use of considerable resources. This study aims to design detailed hierarchical levels of the Arabic tagset categories and their relationships. These hierarchical levels allow easier expansion when required and produce more accurate and precise results. They are based on a comparative study and important references in Arabic grammar; they are also validated by experts in this field. In addition, the proposed tagset is implemented in a PoS tagger and tested via various experiments. We believe that our study makes a significant contribution to the literature because this work is an advancement in the direction of achieving a standard, rich, and comprehensive tagset for Arabic.
Keywords :
Natural Language Processing , Part of Speech , Tagging , Arabic tagset , TreeTagger
Journal title :
Journal Of King Saud University - Computer an‎d Information Sciences
Journal title :
Journal Of King Saud University - Computer an‎d Information Sciences
Record number :
2713740
Link To Document :
بازگشت