Title of article :
A Markovian Approach for Arabic Root Extraction
Author/Authors :
Boudlal, Abderrahim University Mohamed I - Faculty of Letters and Human Sciences, Morocco , Belahbib, Rachid Qatar University - College of Arts and Sciences, Qatar , Lakhouaja, Abdelhak University Mohamed I - Department of Mathematics and Computer Sciences, Morocco , Mazroui, Azzeddine University Mohamed I - Department of Mathematics and Computer Sciences, Morocco
From page :
91
To page :
98
Abstract :
In this paper, we present an Arabic morphological analysis system that assigns, for each word of an unvoweled Arabic sentence, a unique root depending on the context. The proposed system is composed of two modules. The first one consists of an analysis out of context. In this module, we segment each word of the sentence into its elementary morphological units in order to identify its possible roots. For that, we adopt the segmentation of the word into three parts (prefix, stem, suffix). In the second module we use the context to identify the correct root among all the possible roots of the word. For this purpose, we use a Hidden Markov Models approach, where the observations are the words and the possible roots represent the hidden states. We validate the approach using the NEMLAR Arabic writing corpus consisting of 500,000 words. The system gives the correct root in more than 98% of the training set, and in almost 94% of the words in the testing set.
Keywords :
Arabic NLP , morphological analysis , root extraction , hidden Markov models , and Viterbi algorithm
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Record number :
2543553
Link To Document :
بازگشت