• DocumentCode
    3543488
  • Title

    A new and efficient stemming technique for Arabic Text Categorization

  • Author

    Hadni, M. ; Lachkar, A. ; Ouatik, S. Alaoui

  • Author_Institution
    LIM, Univ. Sidi Mohamed Ben Abdellah (USMBA), Fez, Morocco
  • fYear
    2012
  • fDate
    10-12 May 2012
  • Firstpage
    791
  • Lastpage
    796
  • Abstract
    Text preprocessing of Arabic Language is a challenge and crucial stage in Text Categorization (TC) particularly and Text Mining (TM) generally. Stemming algorithms can be used in Arabic text preprocessing to reduce multiple forms of the word to one form (root or stem). Arabic stemming algorithms can be classified, according to the desired level of analysis, as root-based approach (exp Khoja); stem-based approach (Larkey); and statistical approach (n-garm). Yet no a complete stemmer for this language is available: The existing stemmers not have a high performance. In this paper, in order to improve the accuracy of stemming and therefore the accuracy of our proposed TC system, an efficient hybrid method is proposed for stemming Arabic text. The effectiveness of the aforementioned four methods was evaluated and compared in term of the accuracy of the Naïve Bayesian classifier used in our TC system. The proposed stemming algorithm was found to supersede the other stemming ones: The obtained results illustrate that using the proposed stemmer enhances the performance of Arabic Text Categorization: the averages accuracies are: 74.41% for khoja, 59.71% for light stemming, 48.17% for n-grams, and 82.33% for our stemmer.
  • Keywords
    Bayes methods; data mining; natural language processing; pattern classification; statistical analysis; text analysis; Arabic language; Arabic text categorization; naïve Bayesian classifier; root-based approach; statistical approach; stem-based approach; stemming technique; text mining; text preprocessing; Economics; Education; Indexes; Large scale integration; Lead; Refining; Sociology; Arabic Language; Stemming approaches; Text Categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Computing and Systems (ICMCS), 2012 International Conference on
  • Conference_Location
    Tangier
  • Print_ISBN
    978-1-4673-1518-0
  • Type

    conf

  • DOI
    10.1109/ICMCS.2012.6320308
  • Filename
    6320308