• DocumentCode
    2625558
  • Title

    Arabic Script Web Documents Language Identification Using Decision Tree-ARTMAP Model

  • Author

    Selamat, Ali ; Ching, Ng Choon ; Mikami, Yoshiki

  • Author_Institution
    Univ. Teknologi Malaysia, Skudai
  • fYear
    2007
  • fDate
    21-23 Nov. 2007
  • Firstpage
    721
  • Lastpage
    726
  • Abstract
    Automatic language identification (LID) is a topic of great significance in areas of intelligent and security, where the language identities of any related materials need to be identified before any information can be processed. When the recognition elements of any content is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods have been proposed in the literature are focusing on Roman and Asian languages. This paper describes text-based language identification approaches on Arabic script. Two different approaches have been compared. The decision trees method commonly used in many application domain is firstly reviewed. We also applied a simple method for language identification that is based on adaptive resonance learning (ART) neural network. The experimented result shows that the decision tree model achieved highest accuracy than ARTMAP model. However, decision tree model may not reliable if the language used extends to others Arabic script compared to ARTMAP model. It is assumed that hybrid of both models will perform better and merit for further development.
  • Keywords
    ART neural nets; decision trees; grammars; identification; natural language processing; text analysis; ART neural network; Arabic script Web documents language identification; Asian languages; Roman languages; adaptive resonance learning; automatic language identification; decision tree-ARTMAP model; grammar; text-based language identification; Computer science; Conference management; Decision trees; Information technology; Management information systems; Materials science and technology; Natural languages; Neural networks; Resonance; Subspace constraints;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Convergence Information Technology, 2007. International Conference on
  • Conference_Location
    Gyeongju
  • Print_ISBN
    0-7695-3038-9
  • Type

    conf

  • DOI
    10.1109/ICCIT.2007.402
  • Filename
    4420344