• DocumentCode
    605240
  • Title

    Arabic Text Classification Based on Features Reduction Using Artificial Neural Networks

  • Author

    Zaghoul, F.A. ; Al-Dhaheri, S.

  • Author_Institution
    Dept. of Comput. Inf. Syst., Univ. of Jordan, Amman, Jordan
  • fYear
    2013
  • fDate
    10-12 April 2013
  • Firstpage
    485
  • Lastpage
    490
  • Abstract
    Text classification is the process of grouping texts into one or more predefined categories based on their content. Due to the increased availability of documents in digital form and the rapid growth of online information, text classification has become one of the key techniques for handling and organizing text data. Despite the huge textual information that is available online and it increases every day, effective retrieval is becoming more difficult. Text categorization is one solution to tackle this problem. In this paper, we present and analyze the results of the application of Artificial Neural Network (ANN) for the classification of Arabic language documents. The work on automatic categorization of Arabic documents using Artificial Neural Network is limited. The system´s primary source of knowledge is an Arabic text categorization (TC) corpus built locally at the University of Jordan and available at http://nlp.ju.edu.jo. This corpus is used to construct and test the ANN model. Methods of assigning weights and features reductions that reflect the importance of each term are discussed. Each Arabic document is represented by the term weighting scheme. Since the number of unique words in the collection set is big, features reduction methods have been used to select the most relevant features for the classification. The experimental results show that ANN model using features reduction methods achieves better result than the performance of basic ANN on classifying Arabic document.
  • Keywords
    classification; feature extraction; information retrieval; natural language processing; neural nets; text analysis; ANN model; Arabic language document classification; Arabic text categorization; Arabic text classification; TC; University of Jordan; artificial neural network; digital form; features reduction; online information; retrieval; text data handling; text data organizing; Accuracy; Artificial neural networks; Biological neural networks; Frequency measurement; Support vector machine classification; Text categorization; Training; neural network; pca; text classification; vector space model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Modelling and Simulation (UKSim), 2013 UKSim 15th International Conference on
  • Conference_Location
    Cambridge
  • Print_ISBN
    978-1-4673-6421-8
  • Type

    conf

  • DOI
    10.1109/UKSim.2013.135
  • Filename
    6527466