Title :
Speech separation of a target speaker based on deep neural networks
Author :
Jun Du ; Yanhui Tu ; Yong Xu ; Lirong Dai ; Chin-Hui Lee
Author_Institution :
Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
This paper proposes a novel data-driven approach based on deep neural networks (DNNs) for single-channel speech separation. DNN is adopted to directly model the highly non-linear relationship of speech features between a target speaker and the mixed signals. Both supervised and semi-supervised scenarios are investigated. In the supervised mode, both identities of the target speaker and the interfering speaker are provided. While in the semi-supervised mode, only the target speaker is given. We propose using multiple speakers to be mixed with the target speaker to train the DNN which is shown to well predict an unseen interferer in the separation stage. Experimental results demonstrate that our proposed framework achieves better separation results than a GMM-based approach in the supervised mode. More significantly, in the semi-supervised mode which is believed to be the preferred mode in real-world operations, the DNN-based approach even outperforms the GMM-based approach in the supervised mode.
Keywords :
Gaussian processes; feature extraction; mixture models; neural nets; speaker recognition; speech processing; DNN; GMM-based approach; data-driven approach; deep neural networks; interfering speaker; mixed signals; semisupervised mode; speech features; speech separation; target speaker; Hidden Markov models; Neural networks; Predictive models; Signal to noise ratio; Speech; Speech processing; Training; deep neural networks; semi-supervised mode; single-channel speech separation; supervised mode;
Conference_Titel :
Signal Processing (ICSP), 2014 12th International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4799-2188-1
DOI :
10.1109/ICOSP.2014.7015050