مرکز منطقه ای اطلاع رساني علوم و فناوري - MAP estimation of online mapping parameters in ensemble speaker and speaking environment modeling

DocumentCode :

2970481

Title :

MAP estimation of online mapping parameters in ensemble speaker and speaking environment modeling

Author :

Tsao, Yu ; Matsuda, Shigeki ; Nakamura, Satoshi ; Lee, Chin-Hui

Author_Institution :

Spoken Language Commun. Group, Nat. Inst. of Inf. & Commun. Technol., Seika, Japan

fYear :

2009

fDate :

Nov. 13 2009-Dec. 17 2009

Firstpage :

271

Lastpage :

275

Abstract :

Recently, an ensemble speaker and speaking environment modeling (ESSEM) framework was proposed to enhance automatic speech recognition performance under adverse conditions. In the online phase of ESSEM, the prepared environment structure in the offline stage is transformed to a set of acoustic models for the target testing environment by using a mapping function. In the original ESSEM framework, the mapping function parameters are estimated based on a maximum likelihood (ML) criterion. In this study, we propose to use a maximum a posteriori (MAP) criterion to calculate the mapping function to avoid a possible over-fitting problem that can degrade the accuracy of environment characterization. For the MAP estimation, we also study two types of prior densities, namely, clustered prior and hierarchical prior, in this paper. On the Aurora-2 task using either type of prior densities, MAP-based ESSEM can achieve better performance than ML-based ESSEM, especially under low SNR conditions. When comparing to our best baseline results, the MAP-based ESSEM achieves a 14.97% (5.41% to 4.60%) word error rate reduction in average at a signal to noise ratio of 0 dB to 20 dB over the three testing sets.

Keywords :

maximum likelihood estimation; speech recognition; Aurora-2 task; MAP estimation; automatic speech recognition performance; ensemble speaker; maximum likelihood criterion; maximum posteriori criterion; online mapping parameter; speaking environment modeling; target testing environment; word error rate reduction; Acoustic distortion; Acoustic testing; Automatic speech recognition; Degradation; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Samarium; Signal to noise ratio;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Conference_Location :

Merano

Print_ISBN :

978-1-4244-5478-5

Electronic_ISBN :

978-1-4244-5479-2

Type :

conf

DOI :

10.1109/ASRU.2009.5373236

Filename :

5373236

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2970481