DocumentCode :
2970481
Title :
MAP estimation of online mapping parameters in ensemble speaker and speaking environment modeling
Author :
Tsao, Yu ; Matsuda, Shigeki ; Nakamura, Satoshi ; Lee, Chin-Hui
Author_Institution :
Spoken Language Commun. Group, Nat. Inst. of Inf. & Commun. Technol., Seika, Japan
fYear :
2009
fDate :
Nov. 13 2009-Dec. 17 2009
Firstpage :
271
Lastpage :
275
Abstract :
Recently, an ensemble speaker and speaking environment modeling (ESSEM) framework was proposed to enhance automatic speech recognition performance under adverse conditions. In the online phase of ESSEM, the prepared environment structure in the offline stage is transformed to a set of acoustic models for the target testing environment by using a mapping function. In the original ESSEM framework, the mapping function parameters are estimated based on a maximum likelihood (ML) criterion. In this study, we propose to use a maximum a posteriori (MAP) criterion to calculate the mapping function to avoid a possible over-fitting problem that can degrade the accuracy of environment characterization. For the MAP estimation, we also study two types of prior densities, namely, clustered prior and hierarchical prior, in this paper. On the Aurora-2 task using either type of prior densities, MAP-based ESSEM can achieve better performance than ML-based ESSEM, especially under low SNR conditions. When comparing to our best baseline results, the MAP-based ESSEM achieves a 14.97% (5.41% to 4.60%) word error rate reduction in average at a signal to noise ratio of 0 dB to 20 dB over the three testing sets.
Keywords :
maximum likelihood estimation; speech recognition; Aurora-2 task; MAP estimation; automatic speech recognition performance; ensemble speaker; maximum likelihood criterion; maximum posteriori criterion; online mapping parameter; speaking environment modeling; target testing environment; word error rate reduction; Acoustic distortion; Acoustic testing; Automatic speech recognition; Degradation; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Samarium; Signal to noise ratio;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
Type :
conf
DOI :
10.1109/ASRU.2009.5373236
Filename :
5373236
Link To Document :
بازگشت