• DocumentCode
    257804
  • Title

    Discriminatively trained recurrent neural networks for single-channel speech separation

  • Author

    Weninger, Felix ; Hershey, John R. ; Le Roux, Jonathan ; Schuller, Bjorn

  • Author_Institution
    Machine Intell. & Signal Process. Group (MISP), Tech. Univ. Munchen, Munich, Germany
  • fYear
    2014
  • fDate
    3-5 Dec. 2014
  • Firstpage
    577
  • Lastpage
    581
  • Abstract
    This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.
  • Keywords
    learning (artificial intelligence); matrix decomposition; recurrent neural nets; regression analysis; signal reconstruction; source separation; speech intelligibility; speech processing; 2nd CHiME Speech Separation and Recognition Challenge; deep neural networks; discriminatively trained recurrent neural networks; feature representation fine tuning; generic discriminative training criterion; long short-term memory recurrent DNN; network architectures; nonnegative matrix factorization; optimal source reconstruction; reduced feature space; regression; single-channel speech separation; time-frequency mask estimation; Approximation methods; Discrete Fourier transforms; Recurrent neural networks; Speech; Speech processing; Speech recognition; Training; deep neural networks; discriminative training; speech enhancement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on
  • Conference_Location
    Atlanta, GA
  • Type

    conf

  • DOI
    10.1109/GlobalSIP.2014.7032183
  • Filename
    7032183