Title :
A Novel Algorithm to Extract Tri-Literal Arabic Roots
Author :
Momani, Mohanned ; Faraj, Jamil
Author_Institution :
AABFS, Amman
Abstract :
Stemming role and root extraction in the context of information retrieval systems is significant particularly for the Arabic language. In this article, we proposed and implemented a novel algorithm to extract tri-literal Arabic roots. Rootless words are filtered out then prefixes and suffixes removal is performed. Double letters that belong to the Arabic word are removed after sorting term letters. Letter removal is conducted until three letters are remained. Finally, the remaining letters are arranged according to their order in the original word. The implementation of the algorithm has been tested on two types of Arabic text documents. The results of both runs were very promising and satisfactory showing over 73% of accuracy.
Keywords :
feature extraction; information retrieval; natural language processing; query languages; Arabic language; Arabic text documents; information retrieval systems; letter removal; prefixes-suffixes removal; stemming role; triliteral arabic root extraction; Algorithm design and analysis; Data mining; Information retrieval; Pattern matching; Shape; Sorting; Surface morphology; Testing; Visual BASIC;
Conference_Titel :
Computer Systems and Applications, 2007. AICCSA '07. IEEE/ACS International Conference on
Conference_Location :
Amman
Print_ISBN :
1-4244-1030-4
Electronic_ISBN :
1-4244-1031-2
DOI :
10.1109/AICCSA.2007.370899