مرکز منطقه ای اطلاع رساني علوم و فناوري - A linear projection approach to environment modeling for robust speech recognition

DocumentCode :

3163172

Title :

A linear projection approach to environment modeling for robust speech recognition

Author :

Tsao, Yu ; Huang, Chien-Lin ; Matsuda, Shigeki ; Hori, Chiori ; Kashioka, Hideki

Author_Institution :

Res. Center for Inf. Technol. Innovation, Acad. Sinica, Taipei, Taiwan

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4329

Lastpage :

4332

Abstract :

Use of a linear projection (LP) function to transform multiple sets of acoustic models into a single set of acoustic models is proposed for characterizing testing environments for robust automatic speech recognition. The LP function is an extension of the linear regression (LR) function used in maximum likelihood linear regression (MLLR) and maximum a posteriori linear regression (MAPLR) by incorporating local information in the ensemble acoustic space to enhance the environment modeling capacity. To estimate the nuisance parameters of the LP function, we developed maximum likelihood LP (MLLP) and maximum a posteriori LP (MAPLP) and derived a set of integrated prior (IP) densities for MAPLP. The IP densities integrate multiple knowledge sources from the training set, previously seen speech data, current utterance, and a prepared tree structure. We evaluated the proposed MLLP and MAPLP on the Aurora-2 database in an unsupervised model adaptation manner. Experimental results show that the LP function outperforms the LR function with both ML- and MAP-based estimates over different test conditions. Moreover, because the MAP-based estimate can handle over-fittings well, MAPLP has clear improvements over MLLP. Compared to the baseline result, MAPLP provides a significant 10.99% word error rate reduction.

Keywords :

maximum likelihood estimation; regression analysis; speech recognition; Aurora-2 database; IP densities; LP function; MAP-based estimates; MAPLR; MLLR; acoustic models; acoustic space; automatic speech recognition; environment modeling capacity enhancement; integrated prior densities; linear projection approach; maximum a posteriori linear regression; maximum likelihood linear regression; robust speech recognition; speech data; unsupervised model adaptation manner; word error rate reduction; Acoustics; Hidden Markov models; IP networks; Speech; Testing; Training; Vectors; Acoustic Model Adaptation; Environment Modeling; Linear Projection; Robust Speech Recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288877

Filename :

6288877

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3163172