DocumentCode
3543488
Title
A new and efficient stemming technique for Arabic Text Categorization
Author
Hadni, M. ; Lachkar, A. ; Ouatik, S. Alaoui
Author_Institution
LIM, Univ. Sidi Mohamed Ben Abdellah (USMBA), Fez, Morocco
fYear
2012
fDate
10-12 May 2012
Firstpage
791
Lastpage
796
Abstract
Text preprocessing of Arabic Language is a challenge and crucial stage in Text Categorization (TC) particularly and Text Mining (TM) generally. Stemming algorithms can be used in Arabic text preprocessing to reduce multiple forms of the word to one form (root or stem). Arabic stemming algorithms can be classified, according to the desired level of analysis, as root-based approach (exp Khoja); stem-based approach (Larkey); and statistical approach (n-garm). Yet no a complete stemmer for this language is available: The existing stemmers not have a high performance. In this paper, in order to improve the accuracy of stemming and therefore the accuracy of our proposed TC system, an efficient hybrid method is proposed for stemming Arabic text. The effectiveness of the aforementioned four methods was evaluated and compared in term of the accuracy of the Naïve Bayesian classifier used in our TC system. The proposed stemming algorithm was found to supersede the other stemming ones: The obtained results illustrate that using the proposed stemmer enhances the performance of Arabic Text Categorization: the averages accuracies are: 74.41% for khoja, 59.71% for light stemming, 48.17% for n-grams, and 82.33% for our stemmer.
Keywords
Bayes methods; data mining; natural language processing; pattern classification; statistical analysis; text analysis; Arabic language; Arabic text categorization; naïve Bayesian classifier; root-based approach; statistical approach; stem-based approach; stemming technique; text mining; text preprocessing; Economics; Education; Indexes; Large scale integration; Lead; Refining; Sociology; Arabic Language; Stemming approaches; Text Categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia Computing and Systems (ICMCS), 2012 International Conference on
Conference_Location
Tangier
Print_ISBN
978-1-4673-1518-0
Type
conf
DOI
10.1109/ICMCS.2012.6320308
Filename
6320308
Link To Document