DocumentCode :
3238844
Title :
Stemming techniques for Arabic words: A comparative study
Author :
Al-Nashashibi, May Y. ; Neagu, D. ; Yaghi, Ali A.
Author_Institution :
Dept. of Comput., Univ. of Bradford, Bradford, UK
fYear :
2010
fDate :
2-4 Nov. 2010
Firstpage :
270
Lastpage :
276
Abstract :
Text interpretation depends among other things on a pre-processing stage in extracting effectively a correct stem or root. Since there is no available standard stemmer for Arabic, we address here five methods for extracting Arabic roots and the outcomes of the approach with best results will be used later on. Four of these methods are based on a positional-letter-ranking approach where such an approach is investigated along with an adjustment, and two proposed variants. The fifth one is a rule-based approach. An algorithm for correcting irregular words is applied for all methods and a comparison is made between all approaches. The accuracy of these methods was found by comparing extracted roots with a predefined list of roots using an in-house text collection. Results show that the correction algorithm improved the accuracy of the rule-based one by about 14% and the positional letter ranking based algorithms by 7% to 10%. The adjusted positional letter ranking method proved to be the highest in accuracy among all five algorithms but slightly higher than the rule-based one. However, the rule-based algorithm was found to be the approach with the highest accuracy among all ten algorithms when the correction algorithm was included in it.
Keywords :
knowledge based systems; natural language processing; text analysis; word processing; Arabic root extraction; Arabic word; correction algorithm; positional letter ranking approach; rule based approach; stemming technique; text interpretation; text preprocessing; Art; Data preprocessing; Arabic Root Extraction; Natural Language Processing; Positional Letter Ranking; Rule-Based; Text Mining; Variance; t-test;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Technology and Development (ICCTD), 2010 2nd International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-8844-5
Electronic_ISBN :
978-1-4244-8845-2
Type :
conf
DOI :
10.1109/ICCTD.2010.5645873
Filename :
5645873
Link To Document :
بازگشت