DocumentCode :
3163147
Title :
Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
Author :
Xiao, Xiong ; Chng, Eng Siong ; Li, Haizhou
Author_Institution :
Temasek Lab., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4325
Lastpage :
4328
Abstract :
In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters´ coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data.
Keywords :
maximum likelihood estimation; reverberation; speech recognition; Aurora-5 benchmark task; JSTN framework; MVN; TSN; additive noise; feature normalization approach; feature statistics; feature trajectory; joint spectral and temporal normalization; joint spectral-and-temporal normalization; likelihood-based objective function; linear filter; mean-and-variance normalization; noisy speech recognition; reverberated speech recognition; reverberation; robust speech recognition; spectral statistics; temporal statistics; temporal structure normalization; Linear programming; Reverberation; Robustness; Speech; Speech recognition; Trajectory; Vectors; dereverberation; feature normalization; robust speech recognition; temporal structure normalization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288876
Filename :
6288876
Link To Document :
بازگشت