DocumentCode :
730682
Title :
A deep neural network for time-domain signal reconstruction
Author :
Yuxuan Wang ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4390
Lastpage :
4394
Abstract :
Supervised speech separation has achieved considerable success recently. Typically, a deep neural network (DNN) is used to estimate an ideal time-frequency mask, and clean speech is produced by feeding the mask-weighted output to a resynthesizer in a subsequent step. So far, the success of DNN-based separation lies mainly in improving human speech intelligibility. In this work, we propose a new deep network that directly reconstructs the time-domain clean signal through an inverse fast Fourier transform layer. The joint training of speech resynthesis and mask estimation yields improved objective quality while maintaining the objective intelligibility performance. The proposed system significantly outperforms a recent non-negative matrix factorization based separation system in both objective speech intelligibility and quality.
Keywords :
fast Fourier transforms; inverse transforms; neural nets; signal reconstruction; speech intelligibility; time-frequency analysis; DNN-based separation; deep neural network; human speech intelligibility; ideal time-frequency mask; inverse fast Fourier transform layer; mask estimation; nonnegative matrix factorization based separation system; speech resynthesis; supervised speech separation; time-domain signal reconstruction; Noise measurement; Signal to noise ratio; Speech; Time-domain analysis; Time-frequency analysis; Training; Deep neural network; speech separation; time-domain signal; time-frequency masking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178800
Filename :
7178800
Link To Document :
بازگشت