Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition

Author

Cong Liu ; Yu Hu ; Li-Rong Dai ; Hui Jiang

Author_Institution

iFlytek Res., Hefei, China

Volume

19

Issue

8

fYear

2011

Firstpage

2474

Lastpage

2485

Abstract

In this paper, we have proposed two novel optimization methods for discriminative training (DT) of hidden Markov models (HMMs) in speech recognition based on an efficient global optimization algorithm used to solve the so-called trust region (TR) problem, where a quadratic function is minimized under a spherical constraint. In the first method, maximum mutual information estimation (MMIE) of Gaussian mixture HMMs is formulated as a standard TR problem so that the efficient global optimization method can be used in each iteration to maximize the auxiliary function of discriminative training for speech recognition. In the second method, we propose to construct a new auxiliary function for DT of HMMs by adding a quadratic penalty term. The new auxiliary function is constructed to serve as first-order approximation as well as lower bound of the original discriminative objective function within a locality constraint. Due to the lower-bound property, the found optimal point of the new auxiliary function is guaranteed to improve the original discriminative objective function until it converges to a local optimum or stationary point of the objective function. Both TR-based optimization methods have been investigated on two standard large-vocabulary continuous speech recognition tasks, using the WSJ0 and Switchboard databases. Experimental results have shown that the proposed TR methods outperform the conventional EBW method in terms of convergence behavior as well as recognition performance.

Keywords

Gaussian processes; approximation theory; hidden Markov models; quadratic programming; speech recognition; Gaussian mixture HMM; WSJO databases; auxiliary function; convergence behavior; discriminative objective function; discriminative training; first-order approximation; global optimization algorithm; hidden Markov models; lower-bound property; maximum mutual information estimation; objective function; quadratic penalty term; spherical constraint; standard large-vocabulary continuous speech recognition tasks; switchboard databases; trust region-based optimization; Automatic speech recognition; Hidden Markov models; Mutual information; Optimization; Discriminative training; global optimization; lower-bounded auxiliary function; trust region problem;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

Conference_Location

4/21/2011 12:00:00 AM

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2011.2144969

Filename

5753922