DocumentCode :
1634481
Title :
Chinese Unknown Word Recognition Using Improved Conditional Random Fields
Author :
Xu, Yisu ; Wang, Xuan ; Tang, Buzhou ; Wang, Xiaolong
Author_Institution :
Dept. of Comput. Sci., Harbin Inst. of Technol., Shenzhen
Volume :
2
fYear :
2008
Firstpage :
363
Lastpage :
367
Abstract :
Unknown word recognition is a very important problem in natural language processing. It has a great influence on the performance of dictionary construction and word segmentation. This paper introduces two methods to improve the effect of Chinese unknown word recognition by using Conditional Random Fields: the rough label of the characters and the N-best listing. The CRF with the two methods proposed by this paper can increase recall rate of out-of-vocabulary (ROOV) against original CRF model by 15% which is the key point when doing unknown word recognition. It has the same result as the highest recall rate of OOV in Sighan Bakeoff 2005 close test on Peking University (PKU) corpora, however, a much higher recall rate of in-vocabulary (RIV) than others.
Keywords :
natural language processing; random processes; text analysis; vocabulary; word processing; Chinese unknown word recognition; N-best listing; conditional random field; dictionary construction; natural language processing; vocabulary; word segmentation; Application software; Character recognition; Computer science; Costs; Dictionaries; Entropy; Intelligent systems; Natural language processing; Natural languages; Testing; Unknown words recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-0-7695-3382-7
Type :
conf
DOI :
10.1109/ISDA.2008.283
Filename :
4696359
Link To Document :
بازگشت