DocumentCode :
3163172
Title :
A linear projection approach to environment modeling for robust speech recognition
Author :
Tsao, Yu ; Huang, Chien-Lin ; Matsuda, Shigeki ; Hori, Chiori ; Kashioka, Hideki
Author_Institution :
Res. Center for Inf. Technol. Innovation, Acad. Sinica, Taipei, Taiwan
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4329
Lastpage :
4332
Abstract :
Use of a linear projection (LP) function to transform multiple sets of acoustic models into a single set of acoustic models is proposed for characterizing testing environments for robust automatic speech recognition. The LP function is an extension of the linear regression (LR) function used in maximum likelihood linear regression (MLLR) and maximum a posteriori linear regression (MAPLR) by incorporating local information in the ensemble acoustic space to enhance the environment modeling capacity. To estimate the nuisance parameters of the LP function, we developed maximum likelihood LP (MLLP) and maximum a posteriori LP (MAPLP) and derived a set of integrated prior (IP) densities for MAPLP. The IP densities integrate multiple knowledge sources from the training set, previously seen speech data, current utterance, and a prepared tree structure. We evaluated the proposed MLLP and MAPLP on the Aurora-2 database in an unsupervised model adaptation manner. Experimental results show that the LP function outperforms the LR function with both ML- and MAP-based estimates over different test conditions. Moreover, because the MAP-based estimate can handle over-fittings well, MAPLP has clear improvements over MLLP. Compared to the baseline result, MAPLP provides a significant 10.99% word error rate reduction.
Keywords :
maximum likelihood estimation; regression analysis; speech recognition; Aurora-2 database; IP densities; LP function; MAP-based estimates; MAPLR; MLLR; acoustic models; acoustic space; automatic speech recognition; environment modeling capacity enhancement; integrated prior densities; linear projection approach; maximum a posteriori linear regression; maximum likelihood linear regression; robust speech recognition; speech data; unsupervised model adaptation manner; word error rate reduction; Acoustics; Hidden Markov models; IP networks; Speech; Testing; Training; Vectors; Acoustic Model Adaptation; Environment Modeling; Linear Projection; Robust Speech Recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288877
Filename :
6288877
Link To Document :
بازگشت