• DocumentCode
    675648
  • Title

    An improved Arabic light stemmer

  • Author

    Elrajubi, Osama Mohamed

  • Author_Institution
    Dept. of Commun. & Networks, Misurata Univ., Misurata, Libya
  • fYear
    2013
  • fDate
    27-28 Nov. 2013
  • Firstpage
    33
  • Lastpage
    38
  • Abstract
    According to the desired level of analyzing words, Arabic stemming algorithms can be classified into stem-based (light stemming algorithms), and root-based algorithms. Light stemming algorithms only remove prefixes and suffixes from the words, while root-based algorithms remove prefixes, suffixes and infixes. There are several light stemmers for Arabic (Light1, Light2, Light3, Light8, and Light10), For retrieval information Light10 stemmer is out-performed the other light stemmers. In this paper, Arabic stemming algorithms are studied. And, literature review of Arabic stemmers is discussed. In addition, a new Arabic light stemmer was proposed and Implemented. The main step of the light stemmer is removing the prefixes and suffixes of the words. And because this step causes changing of the meaning of some words, many other steps are designed and implemented in the proposed stemmer. The proposed stemmer and Light10 stemmer were tested on the same Arabic data which is developed in this work. The accuracy rate of Light10 stemmer was 66%, while the accuracy rate of the proposed stemmer was 88.25 %. The reasons for incorrect stemming of the proposed stemmer are mentioned.
  • Keywords
    information retrieval; natural language processing; Arabic light stemmer; Light1 stemmer; Light10 stemmer; Light2 stemmer; Light3 stemmer; Light8 stemmer; infix removal; information retrieval; light stemming algorithms; prefix removal; root-based algorithms; stem-based algorithms; suffix removal; words analysis; Accuracy; Algorithm design and analysis; Classification algorithms; Information retrieval; Information systems; Internet; Technological innovation; Arabic light stemmer; Arabic retrieval; Arabic stemming; suffixes and prefixes stripping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Research and Innovation in Information Systems (ICRIIS), 2013 International Conference on
  • Conference_Location
    Kuala Lumpur
  • Print_ISBN
    978-1-4799-2486-8
  • Type

    conf

  • DOI
    10.1109/ICRIIS.2013.6716682
  • Filename
    6716682