Single-channel speech separation with memory-enhanced recurrent neural networks

Author

Weninger, Felix ; Eyben, Florian ; Schuller, Bjorn

Author_Institution

Machine Intell. & Signal Process. Group, Tech. Univ. Munchen, München, Germany

fYear

2014

fDate

4-9 May 2014

Firstpage

3709

Lastpage

3713

Abstract

In this paper we propose the use of Long Short-Term Memory recurrent neural networks for speech enhancement. Networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features. Extensive tests are run on 73 k noisy and reverberated utterances from the Audio-Visual Interest Corpus of spontaneous, emotionally colored speech, degraded by several hours of real noise recordings comprising stationary and non-stationary sources and convolutive noise from the Aachen Room Impulse Response database. In the result, the proposed method is shown to provide superior noise reduction at low signal-to-noise ratios while creating very little artifacts at higher signal-to-noise ratios, thereby outperforming unsupervised magnitude domain spectral subtraction by a large margin in terms of source-distortion ratio.

Keywords

acoustic convolution; learning (artificial intelligence); recurrent neural nets; reverberation; speech enhancement; Aachen Room Impulse Response database; audio-visual interest corpus; clean speech prediction; convolutive noise; long-term memory recur- rent neural networks; magnitude domain soft mask; memory-enhanced recurrent neural network training; noise reduction; noise speech feature prediction; noisy utterances; nonstationary sources; real noise recordings; reverberated utterances; short-term memory recurrent neural networks; signal-to-noise ratios; single-channel speech separation; source-distortion ratio; speech enhancement; spontaneous-emotionally colored speech; stationary sources; Estimation; Noise; Noise measurement; Recurrent neural networks; Speech; Speech enhancement; Training; Long Short-Term Memory; Speech enhancement; recurrent neural networks; speech separation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854294

Filename

6854294