• DocumentCode
    6208
  • Title

    A Regression Approach to Speech Enhancement Based on Deep Neural Networks

  • Author

    Yong Xu ; Jun Du ; Li-Rong Dai ; Chin-Hui Lee

  • Author_Institution
    Nat. Eng. Lab. for Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
  • Volume
    23
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan. 2015
  • Firstpage
    7
  • Lastpage
    19
  • Abstract
    In contrast to the conventional minimum mean square error (MMSE)-based noise reduction techniques, we propose a supervised method to enhance speech by means of finding a mapping function between noisy and clean speech signals based on deep neural networks (DNNs). In order to be able to handle a wide range of additive noises in real-world situations, a large training set that encompasses many possible combinations of speech and noise types, is first designed. A DNN architecture is then employed as a nonlinear regression function to ensure a powerful modeling capability. Several techniques have also been proposed to improve the DNN-based speech enhancement system, including global variance equalization to alleviate the over-smoothing problem of the regression model, and the dropout and noise-aware training strategies to further improve the generalization capability of DNNs to unseen noise conditions. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the conventional MMSE based technique. It is also interesting to observe that the proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general. Furthermore, the resulting DNN model, trained with artificial synthesized data, is also effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
  • Keywords
    least mean squares methods; neural nets; regression analysis; speech enhancement; DNN architecture; MMSE based technique; deep neural networks; global variance equalization; musical artifact; noisy speech data; nonlinear regression function; over-smoothing problem; speech enhancement; supervised method; IEEE transactions; Noise; Noise measurement; Speech; Speech enhancement; Training; Deep neural networks (DNNs); dropout; global variance equalization; noise aware training; noise reduction; nonstationary noise; speech enhancement;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2364452
  • Filename
    6932438