Title :
Stemming as a feature reduction technique for Arabic Text Categorization
Author :
Harrag, Fouzi ; El-Qawasmah, Eyas ; Al-Salman, Abdul Malik S
Author_Institution :
Comput. Sci. Dept., Farhat ABBAS Univ., Setif, Algeria
Abstract :
In this paper, a comparative study is conducted for three text preprocessing techniques in the context of the Arabic text categorization problem using an in-house Arabic dataset. We evaluated and compared three Stemming techniques. They are: Light-Stemming, Root-Based-Stemming and Dictionary-Lookup-Stemming. The purpose is to reduce the feature space into an input space of much lower dimension for two different state-of-the art classifiers: Artificial Neural Networks and support vector machines. The results illustrated that using light stemmer enhances the performance of Arabic Text Categorization. The results also showed that the proposed Artificial Neural Networks model was able to achieve high categorization effectiveness as measured by Macro-Average F1 measure.
Keywords :
natural language processing; neural nets; pattern classification; support vector machines; text analysis; Arabic text categorization; artificial neural networks; classifiers; dictionary-lookup-stemming; feature reduction technique; in-house Arabic dataset; light-stemming; macro-average F1 measure; root-based-stemming; support vector machines; text preprocessing techniques; Artificial neural networks; Classification algorithms; Feature extraction; Support vector machine classification; Text categorization; Vectors; Arabic Text Categorization; Artificial Neural Network; Feature Reduction; Support Vector Machines;
Conference_Titel :
Programming and Systems (ISPS), 2011 10th International Symposium on
Conference_Location :
Algiers
Print_ISBN :
978-1-4577-0905-0
DOI :
10.1109/ISPS.2011.5898874