DocumentCode :
2789582
Title :
Power law discounting for n-gram language models
Author :
Huang, Songfang ; Renals, Steve
Author_Institution :
Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
5178
Lastpage :
5181
Abstract :
We present an approximation to the Bayesian hierarchical Pitman-Yor process language model which maintains the power law distribution over word tokens, while not requiring a computationally expensive approximate inference process. This approximation, which we term power law discounting, has a similar computational complexity to interpolated and modified Kneser-Ney smoothing. We performed experiments on meeting transcription using the NIST RT06s evaluation data and the AMI corpus, with a vocabulary of 50,000 words and a language model training set of up to 211 million words. Our results indicate that power law discounting results in statistically significant reductions in perplexity and word error rate compared to both interpolated and modified Kneser-Ney smoothing, while producing similar results to the hierarchical Pitman-Yor process language model.
Keywords :
Bayes methods; approximation theory; computational complexity; computational linguistics; interpolation; natural language processing; word processing; Bayesian hierarchical Pitman-Yor process; Kneser-Ney smoothing; approximation; computational complexity; interpolation; language model; n-gram language model; power law discounting; power law distribution; word error rate; word token; Ambient intelligence; Automatic speech recognition; Bayesian methods; Computational complexity; Error analysis; NIST; Natural languages; Smoothing methods; Speech processing; Vocabulary; Bayesian; Kneser-Ney; Pitman-Yor; absolute discount; language model; power law; smoothing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5495007
Filename :
5495007
Link To Document :
بازگشت