Title :
Toward a unified approach to lexicon optimization and perplexity minimization for Chinese language modeling
Author :
Xiong, Ying ; Zhu, Jie
Author_Institution :
Dept. of Electron. Eng., Shanghai Jiao Tong Univ., China
Abstract :
This paper presents a unified approach to lexicon optimization and perplexity minimization for Chinese language modeling (LM). Instead of using a non-iterative segmentation-detection method, the proposed approach iteratively extracts candidate words, selects new words based on a perplexity minimization criterion and adds the new words into the lexicon. The augmented lexicon, which contains the new words, is used in the next iteration to re-segment the input corpus until the perplexity of the LM is converged. The experiments show that both the precision and recall rates are improved and the perplexity of the LM has reduced 6.3%.
Keywords :
natural languages; optimisation; word processing; Chinese language modeling; lexicon optimization; perplexity minimization; words extraction; Minimization methods; Natural languages; Chinese language modeling; new words extraction; perplexity; word segmentation;
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
DOI :
10.1109/ICMLC.2005.1527606