DocumentCode
605240
Title
Arabic Text Classification Based on Features Reduction Using Artificial Neural Networks
Author
Zaghoul, F.A. ; Al-Dhaheri, S.
Author_Institution
Dept. of Comput. Inf. Syst., Univ. of Jordan, Amman, Jordan
fYear
2013
fDate
10-12 April 2013
Firstpage
485
Lastpage
490
Abstract
Text classification is the process of grouping texts into one or more predefined categories based on their content. Due to the increased availability of documents in digital form and the rapid growth of online information, text classification has become one of the key techniques for handling and organizing text data. Despite the huge textual information that is available online and it increases every day, effective retrieval is becoming more difficult. Text categorization is one solution to tackle this problem. In this paper, we present and analyze the results of the application of Artificial Neural Network (ANN) for the classification of Arabic language documents. The work on automatic categorization of Arabic documents using Artificial Neural Network is limited. The system´s primary source of knowledge is an Arabic text categorization (TC) corpus built locally at the University of Jordan and available at http://nlp.ju.edu.jo. This corpus is used to construct and test the ANN model. Methods of assigning weights and features reductions that reflect the importance of each term are discussed. Each Arabic document is represented by the term weighting scheme. Since the number of unique words in the collection set is big, features reduction methods have been used to select the most relevant features for the classification. The experimental results show that ANN model using features reduction methods achieves better result than the performance of basic ANN on classifying Arabic document.
Keywords
classification; feature extraction; information retrieval; natural language processing; neural nets; text analysis; ANN model; Arabic language document classification; Arabic text categorization; Arabic text classification; TC; University of Jordan; artificial neural network; digital form; features reduction; online information; retrieval; text data handling; text data organizing; Accuracy; Artificial neural networks; Biological neural networks; Frequency measurement; Support vector machine classification; Text categorization; Training; neural network; pca; text classification; vector space model;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Modelling and Simulation (UKSim), 2013 UKSim 15th International Conference on
Conference_Location
Cambridge
Print_ISBN
978-1-4673-6421-8
Type
conf
DOI
10.1109/UKSim.2013.135
Filename
6527466
Link To Document