• DocumentCode
    178889
  • Title

    Single-channel speech separation with memory-enhanced recurrent neural networks

  • Author

    Weninger, Felix ; Eyben, Florian ; Schuller, Bjorn

  • Author_Institution
    Machine Intell. & Signal Process. Group, Tech. Univ. Munchen, München, Germany
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    3709
  • Lastpage
    3713
  • Abstract
    In this paper we propose the use of Long Short-Term Memory recurrent neural networks for speech enhancement. Networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features. Extensive tests are run on 73 k noisy and reverberated utterances from the Audio-Visual Interest Corpus of spontaneous, emotionally colored speech, degraded by several hours of real noise recordings comprising stationary and non-stationary sources and convolutive noise from the Aachen Room Impulse Response database. In the result, the proposed method is shown to provide superior noise reduction at low signal-to-noise ratios while creating very little artifacts at higher signal-to-noise ratios, thereby outperforming unsupervised magnitude domain spectral subtraction by a large margin in terms of source-distortion ratio.
  • Keywords
    acoustic convolution; learning (artificial intelligence); recurrent neural nets; reverberation; speech enhancement; Aachen Room Impulse Response database; audio-visual interest corpus; clean speech prediction; convolutive noise; long-term memory recur- rent neural networks; magnitude domain soft mask; memory-enhanced recurrent neural network training; noise reduction; noise speech feature prediction; noisy utterances; nonstationary sources; real noise recordings; reverberated utterances; short-term memory recurrent neural networks; signal-to-noise ratios; single-channel speech separation; source-distortion ratio; speech enhancement; spontaneous-emotionally colored speech; stationary sources; Estimation; Noise; Noise measurement; Recurrent neural networks; Speech; Speech enhancement; Training; Long Short-Term Memory; Speech enhancement; recurrent neural networks; speech separation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854294
  • Filename
    6854294