DocumentCode :
257804
Title :
Discriminatively trained recurrent neural networks for single-channel speech separation
Author :
Weninger, Felix ; Hershey, John R. ; Le Roux, Jonathan ; Schuller, Bjorn
Author_Institution :
Machine Intell. & Signal Process. Group (MISP), Tech. Univ. Munchen, Munich, Germany
fYear :
2014
fDate :
3-5 Dec. 2014
Firstpage :
577
Lastpage :
581
Abstract :
This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.
Keywords :
learning (artificial intelligence); matrix decomposition; recurrent neural nets; regression analysis; signal reconstruction; source separation; speech intelligibility; speech processing; 2nd CHiME Speech Separation and Recognition Challenge; deep neural networks; discriminatively trained recurrent neural networks; feature representation fine tuning; generic discriminative training criterion; long short-term memory recurrent DNN; network architectures; nonnegative matrix factorization; optimal source reconstruction; reduced feature space; regression; single-channel speech separation; time-frequency mask estimation; Approximation methods; Discrete Fourier transforms; Recurrent neural networks; Speech; Speech processing; Speech recognition; Training; deep neural networks; discriminative training; speech enhancement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on
Conference_Location :
Atlanta, GA
Type :
conf
DOI :
10.1109/GlobalSIP.2014.7032183
Filename :
7032183
Link To Document :
بازگشت