Dynamically configurable acoustic models for speech recognition

Author

Hwang, Mei-Yuh ; Huang, Xuedong

Author_Institution

Microsoft Corp., Redmond, WA, USA

Volume

2

fYear

1998

fDate

12-15 May 1998

Firstpage

669

Abstract

Senones were introduced to share Hidden Markov model (HMM) parameters at a sub-phonetic level as proposed by Hwang and Huang (1992) and decision trees were incorporated to predict unseen phonetic contexts as suggested by Hwang, Haung and Alleva (1993). We describe two applications of the senonic decision tree in (1) dynamically downsizing a speech recognition system for small platforms and in (2) sharing the Gaussian covariances of continuous density HMMs (CHMMs). We experimented on how to balance different parameters that can offer the best trade off between recognition accuracy and system size. The dynamically downsized system, without retraining, performed even better than the regular Baum-Welch (1972) trained system. The shared covariance model provided as good a performance as the unshared full model and thus gave us the freedom to increase the number of Gaussian means to increase the accuracy of the model. Combining the downsizing and covariance sharing algorithms, a total of 8% error reduction was achieved over the Baum-Welch trained system with approximately the same parameter size

Keywords

Gaussian processes; acoustic signal processing; covariance analysis; decision theory; error statistics; hidden Markov models; speech processing; speech recognition; trees (mathematics); Baum-Welch trained system; CHMM; Gaussian covariances; Gaussian means; HMM parameters sharing; Hidden Markov model; continuous density HMM; covariance sharing algorithm; downsizing algorithm; dynamically configurable acoustic models; error reduction; model accuracy; parameter size; performance; phonetic contexts prediction; recognition accuracy; senones; senonic decision tree; shared covariance model; speech recognition; speech recognition system downsizing; sub-phonetic level; system size; unshared full model; Acoustic testing; Decision trees; Density functional theory; Error analysis; Hidden Markov models; Histograms; Resource management; Speech recognition; Statistics; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on

Conference_Location

Seattle, WA

ISSN

1520-6149

Print_ISBN

0-7803-4428-6

Type

conf

DOI

10.1109/ICASSP.1998.675353

Filename

675353