High-performance connected digit recognition using maximum mutual information estimation

Author

Normandin, Yves ; Cardin, Régis ; de Mori, Renato

Author_Institution

Centre de Recherche Inf., McGill Coll., Montreal, Que., Canada

Volume

2

Issue

2

fYear

1994

fDate

4/1/1994 12:00:00 AM

Firstpage

299

Lastpage

311

Abstract

Hidden markov models (HMM´s) are one of the most powerful speech recognition tools available today. Even so, the inadequacies of HMM´s as a “correct” modeling framework for speech are well known. In this context, it is argued in this paper that the maximum mutual information estimation (MMIE) formulation for training is more appropriate than maximum likelihood estimation (MLE) for reducing the error rate. Corrective MMIE training is introduced. It is a very efficient new training algorithm which uses a modified version of a discrete reestimation formula recently proposed by Gopalakrishnan et al.( see IEEE Trans. Inform. Theory, Jan. 1991). Reestimation formulas are proposed for the case of diagonal Gaussian densities and their convergence properties are experimentally demonstrated. A description of how these formulas are integrated into our training algorithm is given. Using the MMIE framework for training, it is shown how weighting the contribution of different parameter sets in the computation of output probabilities introduces substantial recognition improvements. Using the TIDIGITS connected digit corpus, a large number of experiments are performed with the ideas, techniques, and algorithms presented in this paper. These experiments show that MMIE systematically provides substantial error rate reductions with respect to MLE alone and that, thanks to the new training techniques, these results can be obtained at an acceptable computational cost. The best results obtained in the experiments were 0.29% word error rate and 0.89% string error rate on the adult portion of the corpus

Keywords

hidden Markov models; information theory; parameter estimation; speech recognition; stochastic processes; HMM; MMIE; TIDIGITS connected digit corpus; connected digit recognition; convergence properties; diagonal Gaussian densities; discrete reestimation formula; error rate reduction; hidden markov models; maximum mutual information estimation; output probabilities; speech recognition; string error rate; training algorithm; word error rate; Automatic speech recognition; Convergence; Costs; Error analysis; Hidden Markov models; Maximum likelihood decoding; Maximum likelihood estimation; Mutual information; Parameter estimation; Speech recognition;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.279279

Filename

279279