DocumentCode
3533280
Title
Improving arabic text categorization using decision trees
Author
Harrag, Fouzi ; El-Qawasmeh, Eyas ; Pichappan, Pit
Author_Institution
Comput. Sci. Dept., Farhat ABBAS Univ., Setif, Algeria
fYear
2009
fDate
28-31 July 2009
Firstpage
110
Lastpage
115
Abstract
This paper presents the results of classifying Arabic text documents using a decision tree algorithm. Experiments are performed over two self collected data corpus and the results show that the suggested hybrid approach of Document Frequency Thresholding using an embedded information gain criterion of the decision tree algorithm is the preferable feature selection criterion. The study concluded that the effectiveness of the improved classifier is very good and gives generalization accuracy about 0.93 for the scientific corpus and 0.91 for the literary corpus and we also conclude that the effectiveness of the decision tree classifier was increased as we increase the training size, and the nature of the corpus has such a influence on the classifier performance.
Keywords
classification; data mining; decision trees; natural language processing; text analysis; Arabic text categorization; Arabic text document classification; decision trees; document frequency thresholding; feature selection; information gain; text mining; Classification tree analysis; Computer science; Decision trees; Frequency; Natural languages; Performance gain; Support vector machine classification; Support vector machines; Testing; Text categorization; Arabic Corpus; Decision Tree Algorithm; Feature Selection; Text Categorization; Text Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Networked Digital Technologies, 2009. NDT '09. First International Conference on
Conference_Location
Ostrava
Print_ISBN
978-1-4244-4614-8
Electronic_ISBN
978-1-4244-4615-5
Type
conf
DOI
10.1109/NDT.2009.5272214
Filename
5272214
Link To Document