Title :
Improving arabic text categorization using decision trees
Author :
Harrag, Fouzi ; El-Qawasmeh, Eyas ; Pichappan, Pit
Author_Institution :
Comput. Sci. Dept., Farhat ABBAS Univ., Setif, Algeria
Abstract :
This paper presents the results of classifying Arabic text documents using a decision tree algorithm. Experiments are performed over two self collected data corpus and the results show that the suggested hybrid approach of Document Frequency Thresholding using an embedded information gain criterion of the decision tree algorithm is the preferable feature selection criterion. The study concluded that the effectiveness of the improved classifier is very good and gives generalization accuracy about 0.93 for the scientific corpus and 0.91 for the literary corpus and we also conclude that the effectiveness of the decision tree classifier was increased as we increase the training size, and the nature of the corpus has such a influence on the classifier performance.
Keywords :
classification; data mining; decision trees; natural language processing; text analysis; Arabic text categorization; Arabic text document classification; decision trees; document frequency thresholding; feature selection; information gain; text mining; Classification tree analysis; Computer science; Decision trees; Frequency; Natural languages; Performance gain; Support vector machine classification; Support vector machines; Testing; Text categorization; Arabic Corpus; Decision Tree Algorithm; Feature Selection; Text Categorization; Text Mining;
Conference_Titel :
Networked Digital Technologies, 2009. NDT '09. First International Conference on
Conference_Location :
Ostrava
Print_ISBN :
978-1-4244-4614-8
Electronic_ISBN :
978-1-4244-4615-5
DOI :
10.1109/NDT.2009.5272214