Title :
Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
Author :
Xiao, Xiong ; Chng, Eng Siong ; Li, Haizhou
Author_Institution :
Temasek Lab., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters´ coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data.
Keywords :
maximum likelihood estimation; reverberation; speech recognition; Aurora-5 benchmark task; JSTN framework; MVN; TSN; additive noise; feature normalization approach; feature statistics; feature trajectory; joint spectral and temporal normalization; joint spectral-and-temporal normalization; likelihood-based objective function; linear filter; mean-and-variance normalization; noisy speech recognition; reverberated speech recognition; reverberation; robust speech recognition; spectral statistics; temporal statistics; temporal structure normalization; Linear programming; Reverberation; Robustness; Speech; Speech recognition; Trajectory; Vectors; dereverberation; feature normalization; robust speech recognition; temporal structure normalization;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288876