• DocumentCode
    3564634
  • Title

    Arabic Text Root Extraction via Morphological Analysis and Linguistic Constraints

  • Author

    Alsaad, Amal ; Abbod, Maysam

  • Author_Institution
    Dept. of Electron. & Comput. Eng., Brunel Univ., Uxbridge, UK
  • fYear
    2014
  • Firstpage
    125
  • Lastpage
    130
  • Abstract
    Arabic language is vastly inflected, thus the process of effective Arabic text analysis with correct stem and root extraction is challenging. In this paper we present a linguistic root extraction approach that is composed of two main phases. In the first phase we handle removal of affixes including prefixes, suffixes and infixes. Prefixes and suffixes are removed depending on the length of the word, while checking its morphological pattern after each deduction to remove infixes. In the second phase, the root extraction algorithm is developed further to handle weak, hamzated, eliminated-long-vowel and two-letter geminated words as there is a rationally great amount of irregular Arabic words in texts. Before roots are extracted, they are checked against a predefined list of 3800 triliteral and 900 quad literal roots. Series of experiments has been conducted to improve and test the performance of the proposed algorithm. The obtained results revealed that the roots are extracted correctly has improved comparing with Khoja´s stemming algorithm.
  • Keywords
    computational linguistics; data mining; natural language processing; text analysis; Arabic text analysis; Arabic text root extraction; linguistic constraint; morphological analysis; text mining; Algorithm design and analysis; Information retrieval; Pattern matching; Pragmatics; Testing; Text mining; Arabic root extraction; data mining; morphological analyser; natural language processing; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Modelling and Simulation (UKSim), 2014 UKSim-AMSS 16th International Conference on
  • Print_ISBN
    978-1-4799-4923-6
  • Type

    conf

  • DOI
    10.1109/UKSim.2014.43
  • Filename
    7046050