DocumentCode :
2625558
Title :
Arabic Script Web Documents Language Identification Using Decision Tree-ARTMAP Model
Author :
Selamat, Ali ; Ching, Ng Choon ; Mikami, Yoshiki
Author_Institution :
Univ. Teknologi Malaysia, Skudai
fYear :
2007
fDate :
21-23 Nov. 2007
Firstpage :
721
Lastpage :
726
Abstract :
Automatic language identification (LID) is a topic of great significance in areas of intelligent and security, where the language identities of any related materials need to be identified before any information can be processed. When the recognition elements of any content is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods have been proposed in the literature are focusing on Roman and Asian languages. This paper describes text-based language identification approaches on Arabic script. Two different approaches have been compared. The decision trees method commonly used in many application domain is firstly reviewed. We also applied a simple method for language identification that is based on adaptive resonance learning (ART) neural network. The experimented result shows that the decision tree model achieved highest accuracy than ARTMAP model. However, decision tree model may not reliable if the language used extends to others Arabic script compared to ARTMAP model. It is assumed that hybrid of both models will perform better and merit for further development.
Keywords :
ART neural nets; decision trees; grammars; identification; natural language processing; text analysis; ART neural network; Arabic script Web documents language identification; Asian languages; Roman languages; adaptive resonance learning; automatic language identification; decision tree-ARTMAP model; grammar; text-based language identification; Computer science; Conference management; Decision trees; Information technology; Management information systems; Materials science and technology; Natural languages; Neural networks; Resonance; Subspace constraints;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Convergence Information Technology, 2007. International Conference on
Conference_Location :
Gyeongju
Print_ISBN :
0-7695-3038-9
Type :
conf
DOI :
10.1109/ICCIT.2007.402
Filename :
4420344
Link To Document :
بازگشت