DocumentCode :
2321924
Title :
Unknown Word Recognition Based on Maximal Cliques
Author :
Chen, Hao ; Xiao, Bo ; Lin, Zhiqing
Author_Institution :
Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear :
2011
fDate :
10-12 Oct. 2011
Firstpage :
230
Lastpage :
233
Abstract :
Unknown word recognition is a key issue in Chinese information processing. The traditional algorithms of unknown word recognition can be broadly classified into two types: the rule-based methods and the statistical methods. However, these algorithms have some limitations in identifying the unknown words which are created on Internet. The unknown words of Internet have no obvious rules and are composed of common words, so the rule-based methods have limitations in identifying them; while the statistical methods also have limitations in identifying them for they use mutual information. Therefore, this paper proposes an algorithm of unknown word recognition, which is based on the bigram model and uses the method of mining maximal cliques to identify the unknown words of Internet. Experimental results show that the algorithm achieves a higher accuracy than the traditional statistical methods that are based on the N-gram model.
Keywords :
Internet; natural language processing; statistical analysis; Chinese information processing; Internet; N-gram model; bigram model; maximal cliques; rule-based method; statistical method; unknown word recognition; Accuracy; Correlation; Equations; Internet; Mutual information; Noise; Statistical analysis; Bigram Model; Maximal Cliques; N-gram Model; The Unknown Word Recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1827-4
Type :
conf
DOI :
10.1109/CyberC.2011.46
Filename :
6079386
Link To Document :
بازگشت