DocumentCode :
2169960
Title :
Identifying the Dominant Language of Web Page Using Supervised N-grams
Author :
Choon-Ching Ng ; Siau-Chuin Liew ; Hussin, W.M.S.W. ; Herawan, Tutut
Author_Institution :
Fac. of Comput. Syst. & Software Eng., Univ. Malaysia Pahang, Pekan, Malaysia
fYear :
2012
fDate :
26-28 Nov. 2012
Firstpage :
344
Lastpage :
348
Abstract :
Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages.
Keywords :
Web sites; natural language processing; support vector machines; text analysis; Arabic script Web page; Web page dominant language identification; computer science; distance measurement; human-computer interaction; linguistic industry; natural language processing; pattern recognition; supervised N-grams; support vector machine; text language; written text; Arabic script; Support vector machine; language identification; supervised N-grams;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computer Science Applications and Technologies (ACSAT), 2012 International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4673-5832-3
Type :
conf
DOI :
10.1109/ACSAT.2012.74
Filename :
6516378
Link To Document :
بازگشت