Title :
A deep neural network approach to speech bandwidth expansion
Author :
Kehuang Li ; Chin-Hui Lee
Author_Institution :
Sch. of ECE, Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
We propose a deep neural network (DNN) approach to speech bandwidth expansion (BWE) by estimating the spectral mapping function from narrowband (4 kHz in bandwidth) to wideband (8 kHz in bandwidth). Log-spectrum power is used as the input and output features to perform the required nonlinear transformation, and DNNs are trained to realize this high-dimensional mapping function. When evaluating the proposed approach on a large-scale 10-hour test set, we found that the DNN-expanded speech signals give excellent objective quality measures in terms of segmental signal-to-noise ratio and log-spectral distortion when compared with conventional BWE based on Gaussian mixture models (GMMs). Subjective listening tests also give a 69% preference score for DNN-expanded speech over 31% for GMM when the phase information is assumed known. For tests in real operation when the phase information is imaged from the given narrowband signal the preference comparison goes up to 84% versus 16%. A correct phase recovery can further increase the BWE performance for the proposed DNN method.
Keywords :
neural nets; phase estimation; speech processing; deep neural network approach; high-dimensional mapping function; log-spectral distortion; log-spectrum power; narrowband signal; nonlinear transformation; objective quality measures; phase information; phase recovery; segmental signal-to-noise ratio; spectral mapping function; speech bandwidth expansion; speech signals; Feature extraction; Narrowband; Neural networks; Speech; Training; Wideband; Deep neural network; phase estimation; spectrum mapping; speech bandwidth expansion;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178801