DocumentCode :
112670
Title :
Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition
Author :
Baby, Deepak ; Virtanen, Tuomas ; Gemmeke, Jort F. ; Van hamme, Hugo
Author_Institution :
Electr. Eng. Dept. (ESAT), KU Leuven, Leuven, Belgium
Volume :
23
Issue :
11
fYear :
2015
fDate :
Nov. 2015
Firstpage :
1788
Lastpage :
1799
Abstract :
Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.
Keywords :
Gaussian processes; approximation theory; computational complexity; filtering theory; hidden Markov models; learning (artificial intelligence); mixture models; neural nets; signal denoising; speech enhancement; speech recognition; AURORA-2 database; AURORA-4 database; DNN based systems; HMM-GMM; automatic speech recognition; computational complexity reduction; coupled output dictionary; deep-neural network based systems; exemplar-based speech enhancement systems; full-resolution frequency domain; input dictionary; low-rank approximation; modulation spectrogram features; noise exemplars; noisy speech enhancement; time-varying filter; weighted sum-of-speech; Dictionaries; Discrete Fourier transforms; Modulation; Noise; Speech; Speech enhancement; Exemplar-based; modulation envelope; noise robust automatic speech recognition; non-negative sparse coding;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2015.2450491
Filename :
7138598
Link To Document :
بازگشت