DocumentCode :
2180581
Title :
Bayesian class-based language models
Author :
Su, Yi
Author_Institution :
Nuance Commun., Inc., Montreal, QC, Canada
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
5564
Lastpage :
5567
Abstract :
By capturing the intuition of "similar words appear in similar context", the Class-based Language Model (CLM) has found success from research projects to business products. How ever, most CLMs make a simplifying assumption that one word belongs to one class, which models poorly the fact that many words have multiple senses thus should belong to multiple classes. We propose a Bayesian formulation of the CLM, where a many-to-many mapping between words and classes, i.e., soft clustering, are naturally supported. A simple collapsed Gibbs sampler is provided to carry out the inference. Not only did we achieve a 22% relative reduction in perplexity on a Wall Street Journal corpus, but also reduced the word error rate of a state-of-the-art conversational telephony speech recognizer by 6% relative.
Keywords :
Bayes methods; belief networks; natural language processing; speech recognition; text analysis; Bayesian class based language model; Bayesian formulation; CLM; Gibbs sampler; Wall Street Journal corpus; business products; conversational telephony speech recognizer; many-to-many mapping; similar context; similar words; word error rate reduction; Bayesian methods; Computational modeling; Markov processes; Smoothing methods; Speech; Training; Vocabulary; Bayesian statistics; Gibbs sampling; class-based language model; hierarchical Pitman-Yor process;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947620
Filename :
5947620
Link To Document :
بازگشت