DocumentCode :
3429929
Title :
Evaluating Deep Scattering Spectra with deep neural networks on large scale spontaneous speech task
Author :
Fousek, Petr ; Dognin, Pierre ; Goel, Vaibhava
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4550
Lastpage :
4554
Abstract :
Deep Scattering Network features introduced for image processing have recently proved useful in speech recognition as an alternative to log-mel features for Deep Neural Network (DNN) acoustic models. Scattering features use wavelet decomposition directly producing log-frequency spectrograms which are robust to local time warping and provide additional information within higher order coefficients. This paper extends previous works by showing how scattering features perform on a state-of-the-art spontaneous speech recognition utilizing DNN acoustic model. We revisit feature normalization and compression topics in an extensive study, putting emphasis on comparing models of the same size. We observe that scattering features outperform baseline log-mel in all conditions, with additional gains from multi-resolution processing.
Keywords :
image resolution; speech recognition; wavelet neural nets; DNN acoustic model; deep scattering neural network feature; feature normalization; higher-order coefficients; image processing; local time warping; log frequency spectrogram; multiresolution processing; state-of-the-art spontaneous speech recognition; wavelet decomposition; Acoustics; Decision support systems; Neural networks; Scattering; Speech; Speech recognition; Training; deep neural networks; deep scattering networks; sequence training criterion; spontaneous speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178832
Filename :
7178832
Link To Document :
بازگشت