مرکز منطقه ای اطلاع رساني علوم و فناوري - Deep Scattering Spectrum with deep neural networks

DocumentCode :

177468

Title :

Deep Scattering Spectrum with deep neural networks

Author :

Peddinti, Vijayaditya ; Sainath, TaraN ; Maymon, Shay ; Ramabhadran, Bhuvana ; Nahamoo, David ; Goel, Vikas

Author_Institution :

Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

210

Lastpage :

214

Abstract :

State-of-the-art convolutional neural networks (CNNs) typically use a log-mel spectral representation of the speech signal. However, this representation is limited by the spectro-temporal resolution afforded by log-mel filter-banks. A novel technique known as Deep Scattering Spectrum (DSS) addresses this limitation and preserves higher resolution information, while ensuring time warp stability, through the cascaded application of the wavelet-modulus operator. The first order scatter is equivalent to log-mel features and standard CNN modeling techniques can directly be used with these features. However the higher order scatter, which preserves the higher resolution information, presents new challenges in modelling. This paper explores how to effectively use DSS features with CNN acoustic models. Specifically, we identify the effective normalization, neural network topology and regularization techniques to effectively model higher order scatter. The use of these higher order scatter features, in conjunction with CNNs, results in relative improvement of 7% compared to log-mel features on TIMIT, providing a phonetic error rate (PER) of 17.4%, one of the lowest reported PERs to date on this task.

Keywords :

neural nets; signal representation; signal resolution; speech processing; wavelet transforms; DSS; convolutional neural networks; deep neural networks; deep scattering spectrum; log-mel filter-banks; log-mel spectral representation; phonetic error rate; spectro-temporal resolution; speech signal; time warp stability; wavelet-modulus operator; Acoustics; Convolution; Decision support systems; Neural networks; Scattering; Signal resolution; Speech; deep scattering spectrum; neural networks;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6853588

Filename :

6853588

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=177468