DocumentCode
177459
Title
Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition
Author
Toth, Laszlo
Author_Institution
MTA-SZTE Res. Group on Artificial Intell., Univ. of Szeged, Szeged, Hungary
fYear
2014
fDate
4-9 May 2014
Firstpage
190
Lastpage
194
Abstract
Convolutional neural networks have proved very successful in image recognition, thanks to their tolerance to small translations. They have recently been applied to speech recognition as well, using a spectral representation as input. However, in this case the translations along the two axes - time and frequency - should be handled quite differently. So far, most authors have focused on convolution along the frequency axis, which offers invariance to speaker and speaking style variations. Other researchers have developed a different network architecture that applies time-domain convolution in order to process a longer time-span of input in a hierarchical manner. These two approaches have different background motivations, and both offer significant gains over a standard fully connected network. Here we show that the two network architectures can be readily combined, like their advantages. With the combined model we report an error rate of 16.7% on the TIMIT phone recognition task, a new record on this dataset.
Keywords
convolution; frequency-domain analysis; image recognition; neural nets; speaker recognition; telecommunication computing; time-domain analysis; TIMIT phone recognition task; convolutional neural network; error rate; frequency axis; frequency-domain convolution; image recognition; speaker invariance; speaking style variations; spectral representation; speech recognition; time-domain convolution; Biological neural networks; Convolution; Error analysis; Speech recognition; Time-frequency analysis; Training; Deep neural network; TIMIT; convolutional neural network; rectified linear unit; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6853584
Filename
6853584
Link To Document