مرکز منطقه ای اطلاع رساني علوم و فناوري - Discriminatively trained recurrent neural networks for single-channel speech separation

DocumentCode :

257804

Title :

Discriminatively trained recurrent neural networks for single-channel speech separation

Author :

Weninger, Felix ; Hershey, John R. ; Le Roux, Jonathan ; Schuller, Bjorn

Author_Institution :

Machine Intell. & Signal Process. Group (MISP), Tech. Univ. Munchen, Munich, Germany

fYear :

2014

fDate :

3-5 Dec. 2014

Firstpage :

577

Lastpage :

581

Abstract :

This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.

Keywords :

learning (artificial intelligence); matrix decomposition; recurrent neural nets; regression analysis; signal reconstruction; source separation; speech intelligibility; speech processing; 2nd CHiME Speech Separation and Recognition Challenge; deep neural networks; discriminatively trained recurrent neural networks; feature representation fine tuning; generic discriminative training criterion; long short-term memory recurrent DNN; network architectures; nonnegative matrix factorization; optimal source reconstruction; reduced feature space; regression; single-channel speech separation; time-frequency mask estimation; Approximation methods; Discrete Fourier transforms; Recurrent neural networks; Speech; Speech processing; Speech recognition; Training; deep neural networks; discriminative training; speech enhancement;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on

Conference_Location :

Atlanta, GA

Type :

conf

DOI :

10.1109/GlobalSIP.2014.7032183

Filename :

7032183

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=257804