DocumentCode :
442060
Title :
Toward a unified approach to lexicon optimization and perplexity minimization for Chinese language modeling
Author :
Xiong, Ying ; Zhu, Jie
Author_Institution :
Dept. of Electron. Eng., Shanghai Jiao Tong Univ., China
Volume :
6
fYear :
2005
fDate :
18-21 Aug. 2005
Firstpage :
3824
Abstract :
This paper presents a unified approach to lexicon optimization and perplexity minimization for Chinese language modeling (LM). Instead of using a non-iterative segmentation-detection method, the proposed approach iteratively extracts candidate words, selects new words based on a perplexity minimization criterion and adds the new words into the lexicon. The augmented lexicon, which contains the new words, is used in the next iteration to re-segment the input corpus until the perplexity of the LM is converged. The experiments show that both the precision and recall rates are improved and the perplexity of the LM has reduced 6.3%.
Keywords :
natural languages; optimisation; word processing; Chinese language modeling; lexicon optimization; perplexity minimization; words extraction; Minimization methods; Natural languages; Chinese language modeling; new words extraction; perplexity; word segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
Type :
conf
DOI :
10.1109/ICMLC.2005.1527606
Filename :
1527606
Link To Document :
بازگشت