Title :
Analyzing the properties of smoothing methods for language models
Author :
Huang, Feng-Long ; Yu, Ming-shing
Author_Institution :
Dept. of Appl. Math., Nat. Chung-Hsing Univ., Taichung, Taiwan
Abstract :
The authors discuss the properties of several frequent smoothing methods for language models. Because of the data sparseness problem, smoothing methods are employed to estimate the probability for each n-gram (including all the seen and unseen events). There are several well-known smoothing methods: Additive discount, Good-Turing, Witten-Bell, Katz and Absolute discount. We propose a set of properties to evaluate the statistical behaviors of these methods. Furthermore, a smoothing scheme, Huang-Yu method, is presented, which complies with all the proposed properties
Keywords :
linguistics; natural languages; probability; smoothing methods; Absolute discount; Additive discount; Good-Turing; Huang-Yu method; Katz; Witten-Bell; data sparseness problem; frequent smoothing methods; language models; n-gram; natural language processing; probability; seen events; statistical behaviors; unseen events; Entropy; Mathematical model; Maximum likelihood estimation; Natural language processing; Natural languages; Probability; Smoothing methods; Speech synthesis; Tellurium; Testing;
Conference_Titel :
Systems, Man, and Cybernetics, 2001 IEEE International Conference on
Conference_Location :
Tucson, AZ
Print_ISBN :
0-7803-7087-2
DOI :
10.1109/ICSMC.2001.969865