Title :
Deep Scattering Spectrum with deep neural networks
Author :
Peddinti, Vijayaditya ; Sainath, TaraN ; Maymon, Shay ; Ramabhadran, Bhuvana ; Nahamoo, David ; Goel, Vikas
Author_Institution :
Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
State-of-the-art convolutional neural networks (CNNs) typically use a log-mel spectral representation of the speech signal. However, this representation is limited by the spectro-temporal resolution afforded by log-mel filter-banks. A novel technique known as Deep Scattering Spectrum (DSS) addresses this limitation and preserves higher resolution information, while ensuring time warp stability, through the cascaded application of the wavelet-modulus operator. The first order scatter is equivalent to log-mel features and standard CNN modeling techniques can directly be used with these features. However the higher order scatter, which preserves the higher resolution information, presents new challenges in modelling. This paper explores how to effectively use DSS features with CNN acoustic models. Specifically, we identify the effective normalization, neural network topology and regularization techniques to effectively model higher order scatter. The use of these higher order scatter features, in conjunction with CNNs, results in relative improvement of 7% compared to log-mel features on TIMIT, providing a phonetic error rate (PER) of 17.4%, one of the lowest reported PERs to date on this task.
Keywords :
neural nets; signal representation; signal resolution; speech processing; wavelet transforms; DSS; convolutional neural networks; deep neural networks; deep scattering spectrum; log-mel filter-banks; log-mel spectral representation; phonetic error rate; spectro-temporal resolution; speech signal; time warp stability; wavelet-modulus operator; Acoustics; Convolution; Decision support systems; Neural networks; Scattering; Signal resolution; Speech; deep scattering spectrum; neural networks;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853588