Title :
Mixture of mixture n-gram language models
Author :
Sak, Hasim ; Allauzen, Cyril ; Nakajima, Kensuke ; Beaufays, Francoise
Abstract :
This paper presents a language model adaptation technique to build a single static language model from a set of language models each trained on a separate text corpus while aiming to maximize the likelihood of an adaptation data set given as a development set of sentences. The proposed model can be considered as a mixture of mixture language models. The mixture model at the top level is a sentence-level mixture model where each sentence is assumed to be drawn from one of a discrete set of topic or task clusters. After selecting a cluster, each n-gram is assumed to be drawn from one of the given n-gram language models. We estimate cluster mixture weights and n-gram language model mixture weights for each cluster using the expectation-maximization (EM) algorithm to seek the parameter estimates maximizing the likelihood of the development sentences. This mixture of mixture models can be represented efficiently as a static n-gram language model using the previously proposed Bayesian language model interpolation technique. We show a significant improvement with this technique (both perplexity and WER) compared to the standard one level interpolation scheme.
Keywords :
Bayes methods; expectation-maximisation algorithm; interpolation; maximum likelihood estimation; natural language processing; pattern clustering; speech recognition; text analysis; Bayesian language model interpolation; WER; adaptation data set; cluster selection; expectation-maximization algorithm; language model adaptation technique; level interpolation scheme; likelihood maximization; mixture n-gram language model mixture weights; parameter estimation; sentence development set; sentence-level mixture model; single static language model; speech recognition system; text corpus; Adaptation models; Bayes methods; Clustering algorithms; Data models; Hidden Markov models; Interpolation; Speech recognition; adaptation; bayesian; interpolation; language model; mixture models; speech recognition;
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
DOI :
10.1109/ASRU.2013.6707701