Title :
Chinese Unknown Word Recognition Using Improved Conditional Random Fields
Author :
Xu, Yisu ; Wang, Xuan ; Tang, Buzhou ; Wang, Xiaolong
Author_Institution :
Dept. of Comput. Sci., Harbin Inst. of Technol., Shenzhen
Abstract :
Unknown word recognition is a very important problem in natural language processing. It has a great influence on the performance of dictionary construction and word segmentation. This paper introduces two methods to improve the effect of Chinese unknown word recognition by using Conditional Random Fields: the rough label of the characters and the N-best listing. The CRF with the two methods proposed by this paper can increase recall rate of out-of-vocabulary (ROOV) against original CRF model by 15% which is the key point when doing unknown word recognition. It has the same result as the highest recall rate of OOV in Sighan Bakeoff 2005 close test on Peking University (PKU) corpora, however, a much higher recall rate of in-vocabulary (RIV) than others.
Keywords :
natural language processing; random processes; text analysis; vocabulary; word processing; Chinese unknown word recognition; N-best listing; conditional random field; dictionary construction; natural language processing; vocabulary; word segmentation; Application software; Character recognition; Computer science; Costs; Dictionaries; Entropy; Intelligent systems; Natural language processing; Natural languages; Testing; Unknown words recognition;
Conference_Titel :
Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-0-7695-3382-7
DOI :
10.1109/ISDA.2008.283