DocumentCode
323758
Title
Dynamically configurable acoustic models for speech recognition
Author
Hwang, Mei-Yuh ; Huang, Xuedong
Author_Institution
Microsoft Corp., Redmond, WA, USA
Volume
2
fYear
1998
fDate
12-15 May 1998
Firstpage
669
Abstract
Senones were introduced to share Hidden Markov model (HMM) parameters at a sub-phonetic level as proposed by Hwang and Huang (1992) and decision trees were incorporated to predict unseen phonetic contexts as suggested by Hwang, Haung and Alleva (1993). We describe two applications of the senonic decision tree in (1) dynamically downsizing a speech recognition system for small platforms and in (2) sharing the Gaussian covariances of continuous density HMMs (CHMMs). We experimented on how to balance different parameters that can offer the best trade off between recognition accuracy and system size. The dynamically downsized system, without retraining, performed even better than the regular Baum-Welch (1972) trained system. The shared covariance model provided as good a performance as the unshared full model and thus gave us the freedom to increase the number of Gaussian means to increase the accuracy of the model. Combining the downsizing and covariance sharing algorithms, a total of 8% error reduction was achieved over the Baum-Welch trained system with approximately the same parameter size
Keywords
Gaussian processes; acoustic signal processing; covariance analysis; decision theory; error statistics; hidden Markov models; speech processing; speech recognition; trees (mathematics); Baum-Welch trained system; CHMM; Gaussian covariances; Gaussian means; HMM parameters sharing; Hidden Markov model; continuous density HMM; covariance sharing algorithm; downsizing algorithm; dynamically configurable acoustic models; error reduction; model accuracy; parameter size; performance; phonetic contexts prediction; recognition accuracy; senones; senonic decision tree; shared covariance model; speech recognition; speech recognition system downsizing; sub-phonetic level; system size; unshared full model; Acoustic testing; Decision trees; Density functional theory; Error analysis; Hidden Markov models; Histograms; Resource management; Speech recognition; Statistics; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location
Seattle, WA
ISSN
1520-6149
Print_ISBN
0-7803-4428-6
Type
conf
DOI
10.1109/ICASSP.1998.675353
Filename
675353
Link To Document