Title :
Deep neural networks for estimating speech model activations
Author :
Williamson, Donald S. ; Yuxuan Wang ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. Our approach uses two stages of deep neural networks, where the first stage estimates the ideal ratio mask that separates speech from noise, and the second stage maps the ratio-masked speech to the clean speech activation matrices that are used for nonnegative matrix factorization (NMF). Supervised NMF systems make assumptions about the relationship between the activation and basic matrices that do not always hold. Other two-stage approaches combining masking with NMF reconstruction do not account for mask estimation errors. We show that the proposed algorithm achieves higher objective speech quality and intelligibility compared to these related methods.
Keywords :
acoustic noise; acoustic signal processing; neural nets; speech intelligibility; NMF reconstruction; background noise; clean speech activation matrices; deep neural networks; higher objective speech quality; low signal-noise ratio; mask estimation errors; nonnegative matrix factorization; perceptual quality; ratio-masked speech; speech model activation; speech separation; supervised NMF system; Feature extraction; Hidden Markov models; Noise measurement; Signal to noise ratio; Spectrogram; Speech; deep neural network; nonnegative matrix factorization; speech quality; speech separation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178945