DocumentCode :
1690472
Title :
Modeling heterogeneous data sources for speech recognition using synchronous hidden Markov models
Author :
Yong Zhao ; Biing-Hwang Juang
Author_Institution :
Dept. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
2013
Firstpage :
7403
Lastpage :
7407
Abstract :
In this paper, we propose a novel acoustic modeling framework, synchronous HMM, which takes full advantage of the capacity of the heterogeneous data sources and achieves an optimal balance between modeling accuracy and robustness. The synchronous HMM introduces an additional layer of substates between the HMM states and the Gaussian component variables. The substates have the capability to register long-span non-phonetic attributes, which are integrally called speech scenes in this study. The hierarchical modeling scheme allows an accurate description of probability distribution of speech units in different speech scenes. To address the data sparsity problem, a decision-based clustering algorithm is presented to determine the set of speech scenes and to tie the substate parameters. Moreover, we propose the multiplex Viterbi algorithm to efficiently decode the synchronous HMMs within a search space of the same size as for the standard HMMs. The experiments on the Aurora 2 task show that the synchronous HMMs produce a significant improvement in recognition performance over the HMM baseline at the expense of a moderate increase in the memory requirement and computational complexity.
Keywords :
Gaussian distribution; decoding; hidden Markov models; maximum likelihood estimation; pattern clustering; speech recognition; Aurora 2 task show; Gaussian component variables; HMM baseline; acoustic modeling framework; computational complexity; decision-based clustering algorithm; heterogeneous data sources; long-span nonphonetic attributes; memory requirement; multiplex Viterbi algorithm; probability distribution; speech recognition; speech scenes; synchronous HMM; synchronous hidden Markov models; Computational modeling; Decision trees; Decoding; Hidden Markov models; Multiplexing; Speech; Viterbi algorithm; Speech recognition; Viterbi algorithm; hidden Markov model; system combination;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639101
Filename :
6639101
Link To Document :
بازگشت