A Novel Algorithm to Extract Tri-Literal Arabic Roots

Author

Momani, Mohanned ; Faraj, Jamil

Author_Institution

AABFS, Amman

fYear

2007

fDate

13-16 May 2007

Firstpage

309

Lastpage

315

Abstract

Stemming role and root extraction in the context of information retrieval systems is significant particularly for the Arabic language. In this article, we proposed and implemented a novel algorithm to extract tri-literal Arabic roots. Rootless words are filtered out then prefixes and suffixes removal is performed. Double letters that belong to the Arabic word are removed after sorting term letters. Letter removal is conducted until three letters are remained. Finally, the remaining letters are arranged according to their order in the original word. The implementation of the algorithm has been tested on two types of Arabic text documents. The results of both runs were very promising and satisfactory showing over 73% of accuracy.

Keywords

feature extraction; information retrieval; natural language processing; query languages; Arabic language; Arabic text documents; information retrieval systems; letter removal; prefixes-suffixes removal; stemming role; triliteral arabic root extraction; Algorithm design and analysis; Data mining; Information retrieval; Pattern matching; Shape; Sorting; Surface morphology; Testing; Visual BASIC;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Systems and Applications, 2007. AICCSA '07. IEEE/ACS International Conference on

Conference_Location

Amman

Print_ISBN

1-4244-1030-4

Electronic_ISBN

1-4244-1031-2

Type

conf

DOI

10.1109/AICCSA.2007.370899

Filename

4230974