Author_Institution :
Dept. of Commun. & Networks, Misurata Univ., Misurata, Libya
Abstract :
According to the desired level of analyzing words, Arabic stemming algorithms can be classified into stem-based (light stemming algorithms), and root-based algorithms. Light stemming algorithms only remove prefixes and suffixes from the words, while root-based algorithms remove prefixes, suffixes and infixes. There are several light stemmers for Arabic (Light1, Light2, Light3, Light8, and Light10), For retrieval information Light10 stemmer is out-performed the other light stemmers. In this paper, Arabic stemming algorithms are studied. And, literature review of Arabic stemmers is discussed. In addition, a new Arabic light stemmer was proposed and Implemented. The main step of the light stemmer is removing the prefixes and suffixes of the words. And because this step causes changing of the meaning of some words, many other steps are designed and implemented in the proposed stemmer. The proposed stemmer and Light10 stemmer were tested on the same Arabic data which is developed in this work. The accuracy rate of Light10 stemmer was 66%, while the accuracy rate of the proposed stemmer was 88.25 %. The reasons for incorrect stemming of the proposed stemmer are mentioned.
Keywords :
information retrieval; natural language processing; Arabic light stemmer; Light1 stemmer; Light10 stemmer; Light2 stemmer; Light3 stemmer; Light8 stemmer; infix removal; information retrieval; light stemming algorithms; prefix removal; root-based algorithms; stem-based algorithms; suffix removal; words analysis; Accuracy; Algorithm design and analysis; Classification algorithms; Information retrieval; Information systems; Internet; Technological innovation; Arabic light stemmer; Arabic retrieval; Arabic stemming; suffixes and prefixes stripping;