DocumentCode :
2698387
Title :
Improved Letter Weighting Feature Selection on Arabic Script Language Identification
Author :
Ng, Choon-Ching ; Selamat, Ali
Author_Institution :
Fac. of Comput. Sci. & Inf. Syst., Univ. Teknol. Malaysia, Skudai, Malaysia
fYear :
2009
fDate :
1-3 April 2009
Firstpage :
150
Lastpage :
154
Abstract :
Language identification is the process identifying predefined language in a document automatically; we focused on the Web documents in this paper. Initially, we have applied the letter frequency as features combine with neural networks in Arabic script language identification. However, reliability of selected letters of the features is a major issue to be overcome. Therefore, we propose an improved letter weighting feature selection in order to enhance the effectiveness of language identification. It is based on the concept letter frequency document frequency. From the experiments, we have found that the improved letter weighting feature selection achieve the highest accuracy 99.75% on Arabic script language identification.
Keywords :
document handling; natural language processing; neural nets; Arabic script language identification; Web documents; improved letter weighting feature selection; neural networks; Computer science; Database systems; Deductive databases; Encoding; Feature extraction; Frequency; Information retrieval; Information systems; Natural languages; Neural networks; Arabic Script; Feature Selection; Language Identification; Letter Weighting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Information and Database Systems, 2009. ACIIDS 2009. First Asian Conference on
Conference_Location :
Dong Hoi
Print_ISBN :
978-0-7695-3580-7
Type :
conf
DOI :
10.1109/ACIIDS.2009.33
Filename :
5175984
Link To Document :
بازگشت