مرکز منطقه ای اطلاع رساني علوم و فناوري - Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech

DocumentCode :

3163147

Title :

Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech

Author :

Xiao, Xiong ; Chng, Eng Siong ; Li, Haizhou

Author_Institution :

Temasek Lab., Nanyang Technol. Univ., Singapore, Singapore

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4325

Lastpage :

4328

Abstract :

In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters´ coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data.

Keywords :

maximum likelihood estimation; reverberation; speech recognition; Aurora-5 benchmark task; JSTN framework; MVN; TSN; additive noise; feature normalization approach; feature statistics; feature trajectory; joint spectral and temporal normalization; joint spectral-and-temporal normalization; likelihood-based objective function; linear filter; mean-and-variance normalization; noisy speech recognition; reverberated speech recognition; reverberation; robust speech recognition; spectral statistics; temporal statistics; temporal structure normalization; Linear programming; Reverberation; Robustness; Speech; Speech recognition; Trajectory; Vectors; dereverberation; feature normalization; robust speech recognition; temporal structure normalization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288876

Filename :

6288876

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3163147