DocumentCode
376273
Title
Analyzing the properties of smoothing methods for language models
Author
Huang, Feng-Long ; Yu, Ming-shing
Author_Institution
Dept. of Appl. Math., Nat. Chung-Hsing Univ., Taichung, Taiwan
Volume
1
fYear
2001
fDate
2001
Firstpage
512
Abstract
The authors discuss the properties of several frequent smoothing methods for language models. Because of the data sparseness problem, smoothing methods are employed to estimate the probability for each n-gram (including all the seen and unseen events). There are several well-known smoothing methods: Additive discount, Good-Turing, Witten-Bell, Katz and Absolute discount. We propose a set of properties to evaluate the statistical behaviors of these methods. Furthermore, a smoothing scheme, Huang-Yu method, is presented, which complies with all the proposed properties
Keywords
linguistics; natural languages; probability; smoothing methods; Absolute discount; Additive discount; Good-Turing; Huang-Yu method; Katz; Witten-Bell; data sparseness problem; frequent smoothing methods; language models; n-gram; natural language processing; probability; seen events; statistical behaviors; unseen events; Entropy; Mathematical model; Maximum likelihood estimation; Natural language processing; Natural languages; Probability; Smoothing methods; Speech synthesis; Tellurium; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics, 2001 IEEE International Conference on
Conference_Location
Tucson, AZ
ISSN
1062-922X
Print_ISBN
0-7803-7087-2
Type
conf
DOI
10.1109/ICSMC.2001.969865
Filename
969865
Link To Document