DocumentCode :
323758
Title :
Dynamically configurable acoustic models for speech recognition
Author :
Hwang, Mei-Yuh ; Huang, Xuedong
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Volume :
2
fYear :
1998
fDate :
12-15 May 1998
Firstpage :
669
Abstract :
Senones were introduced to share Hidden Markov model (HMM) parameters at a sub-phonetic level as proposed by Hwang and Huang (1992) and decision trees were incorporated to predict unseen phonetic contexts as suggested by Hwang, Haung and Alleva (1993). We describe two applications of the senonic decision tree in (1) dynamically downsizing a speech recognition system for small platforms and in (2) sharing the Gaussian covariances of continuous density HMMs (CHMMs). We experimented on how to balance different parameters that can offer the best trade off between recognition accuracy and system size. The dynamically downsized system, without retraining, performed even better than the regular Baum-Welch (1972) trained system. The shared covariance model provided as good a performance as the unshared full model and thus gave us the freedom to increase the number of Gaussian means to increase the accuracy of the model. Combining the downsizing and covariance sharing algorithms, a total of 8% error reduction was achieved over the Baum-Welch trained system with approximately the same parameter size
Keywords :
Gaussian processes; acoustic signal processing; covariance analysis; decision theory; error statistics; hidden Markov models; speech processing; speech recognition; trees (mathematics); Baum-Welch trained system; CHMM; Gaussian covariances; Gaussian means; HMM parameters sharing; Hidden Markov model; continuous density HMM; covariance sharing algorithm; downsizing algorithm; dynamically configurable acoustic models; error reduction; model accuracy; parameter size; performance; phonetic contexts prediction; recognition accuracy; senones; senonic decision tree; shared covariance model; speech recognition; speech recognition system downsizing; sub-phonetic level; system size; unshared full model; Acoustic testing; Decision trees; Density functional theory; Error analysis; Hidden Markov models; Histograms; Resource management; Speech recognition; Statistics; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
ISSN :
1520-6149
Print_ISBN :
0-7803-4428-6
Type :
conf
DOI :
10.1109/ICASSP.1998.675353
Filename :
675353
Link To Document :
بازگشت