Analysis-by-synthesis feature estimation for robust automatic speech recognition using spectral masks

Author

Mandel, Michael I. ; Narayanan, Arun

fYear

2014

fDate

4-9 May 2014

Firstpage

2509

Lastpage

2513

Abstract

Spectral masking is a promising method for noise suppression in which regions of the spectrogram that are dominated by noise are attenuated while regions dominated by speech are preserved. It is not clear, however, how best to combine spectral masking with the non-linear processing necessary to compute automatic speech recognition features. We propose an analysis-by-synthesis approach to automatic speech recognition, which, given a spectral mask, poses the estimation of mel frequency cepstral coefficients (MFCCs) of the clean speech as an optimization problem. MFCCs are found that minimize a combination of the distance from the resynthesized clean power spectrum to the regions of the noisy spectrum selected by the mask and the negative log likelihood under an unmodified large vocabulary continuous speech recognizer. In evaluations on the Aurora4 noisy speech recognition task with both ideal and estimated masks, analysis-by-synthesis decreases both word error rates and distances to clean speech as compared to traditional approaches.

Keywords

cepstral analysis; feature extraction; optimisation; speech recognition; vocabulary; Aurora4 noisy speech recognition task; MFCC; analysis-by-synthesis feature estimation; mel frequency cepstral coefficients; negative log likelihood; noise suppression; nonlinear processing; optimization problem; robust automatic speech recognition; spectral masking; spectrogram regions; unmodified large vocabulary; word error rates; Hidden Markov models; Lattices; Noise; Optimization; Speech; Speech processing; Speech recognition; analysis-by-synthesis; large vocabulary automatic speech recognition; missing data; time-frequency masking;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854052

Filename

6854052