Title :
A new and efficient stemming technique for Arabic Text Categorization
Author :
Hadni, M. ; Lachkar, A. ; Ouatik, S. Alaoui
Author_Institution :
LIM, Univ. Sidi Mohamed Ben Abdellah (USMBA), Fez, Morocco
Abstract :
Text preprocessing of Arabic Language is a challenge and crucial stage in Text Categorization (TC) particularly and Text Mining (TM) generally. Stemming algorithms can be used in Arabic text preprocessing to reduce multiple forms of the word to one form (root or stem). Arabic stemming algorithms can be classified, according to the desired level of analysis, as root-based approach (exp Khoja); stem-based approach (Larkey); and statistical approach (n-garm). Yet no a complete stemmer for this language is available: The existing stemmers not have a high performance. In this paper, in order to improve the accuracy of stemming and therefore the accuracy of our proposed TC system, an efficient hybrid method is proposed for stemming Arabic text. The effectiveness of the aforementioned four methods was evaluated and compared in term of the accuracy of the Naïve Bayesian classifier used in our TC system. The proposed stemming algorithm was found to supersede the other stemming ones: The obtained results illustrate that using the proposed stemmer enhances the performance of Arabic Text Categorization: the averages accuracies are: 74.41% for khoja, 59.71% for light stemming, 48.17% for n-grams, and 82.33% for our stemmer.
Keywords :
Bayes methods; data mining; natural language processing; pattern classification; statistical analysis; text analysis; Arabic language; Arabic text categorization; naïve Bayesian classifier; root-based approach; statistical approach; stem-based approach; stemming technique; text mining; text preprocessing; Economics; Education; Indexes; Large scale integration; Lead; Refining; Sociology; Arabic Language; Stemming approaches; Text Categorization;
Conference_Titel :
Multimedia Computing and Systems (ICMCS), 2012 International Conference on
Conference_Location :
Tangier
Print_ISBN :
978-1-4673-1518-0
DOI :
10.1109/ICMCS.2012.6320308