HMM training based on quality measurement

Author

Gao, Yuqing ; Jan, Ea-Ee ; Padmanabhan, Mukund ; Picheny, Michael

Author_Institution

IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA

Volume

1

fYear

1999

fDate

15-19 Mar 1999

Firstpage

129

Abstract

Two discriminant measures for HMM states to improve the effectiveness on HMM training are presented. In HMM based speech recognition, the context-dependent states are usually modeled by Gaussian mixture distributions. In general, the number of Gaussian mixtures for each state is fixed or proportional to the amount of training data. From our study, some of the states are “non-aggressive” compared to others, and a higher acoustic resolution is required for them. Two methods are presented in this paper to determine those non-aggressive states. The first approach uses the recognition accuracy of the states and the second method is based on a rank distribution of states. Baseline systems, trained by a fixed number of Gaussian mixtures for each state, having 33 K and 120 K Gaussians, yield 14.57% and 13.04% word error rates, respectively. Using our approach, a 38 K Gaussian system was constructed that reduces the error rate to 13.95%. The average ranks of non-aggressive states in rank lists of testing data were also seen to dramatic improve compared to the baseline systems

Keywords

Gaussian processes; error statistics; hidden Markov models; speech recognition; Gaussian mixture distributions; HMM states; HMM training; acoustic resolution; average ranks; baseline systems; context-dependent states; discriminant measures; error rate reduction; nonaggressive states; quality measurement; rank distribution; rank lists; recognition accuracy; speech recognition; testing data; training data; word error rates; Acoustic applications; Acoustic measurements; Acoustic testing; Context modeling; Error analysis; Hidden Markov models; Speech recognition; System performance; System testing; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on

Conference_Location

Phoenix, AZ

ISSN

1520-6149

Print_ISBN

0-7803-5041-3

Type

conf

DOI

10.1109/ICASSP.1999.758079

Filename

758079