DocumentCode :
605240
Title :
Arabic Text Classification Based on Features Reduction Using Artificial Neural Networks
Author :
Zaghoul, F.A. ; Al-Dhaheri, S.
Author_Institution :
Dept. of Comput. Inf. Syst., Univ. of Jordan, Amman, Jordan
fYear :
2013
fDate :
10-12 April 2013
Firstpage :
485
Lastpage :
490
Abstract :
Text classification is the process of grouping texts into one or more predefined categories based on their content. Due to the increased availability of documents in digital form and the rapid growth of online information, text classification has become one of the key techniques for handling and organizing text data. Despite the huge textual information that is available online and it increases every day, effective retrieval is becoming more difficult. Text categorization is one solution to tackle this problem. In this paper, we present and analyze the results of the application of Artificial Neural Network (ANN) for the classification of Arabic language documents. The work on automatic categorization of Arabic documents using Artificial Neural Network is limited. The system´s primary source of knowledge is an Arabic text categorization (TC) corpus built locally at the University of Jordan and available at http://nlp.ju.edu.jo. This corpus is used to construct and test the ANN model. Methods of assigning weights and features reductions that reflect the importance of each term are discussed. Each Arabic document is represented by the term weighting scheme. Since the number of unique words in the collection set is big, features reduction methods have been used to select the most relevant features for the classification. The experimental results show that ANN model using features reduction methods achieves better result than the performance of basic ANN on classifying Arabic document.
Keywords :
classification; feature extraction; information retrieval; natural language processing; neural nets; text analysis; ANN model; Arabic language document classification; Arabic text categorization; Arabic text classification; TC; University of Jordan; artificial neural network; digital form; features reduction; online information; retrieval; text data handling; text data organizing; Accuracy; Artificial neural networks; Biological neural networks; Frequency measurement; Support vector machine classification; Text categorization; Training; neural network; pca; text classification; vector space model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Modelling and Simulation (UKSim), 2013 UKSim 15th International Conference on
Conference_Location :
Cambridge
Print_ISBN :
978-1-4673-6421-8
Type :
conf
DOI :
10.1109/UKSim.2013.135
Filename :
6527466
Link To Document :
بازگشت