• DocumentCode
    177459
  • Title

    Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition

  • Author

    Toth, Laszlo

  • Author_Institution
    MTA-SZTE Res. Group on Artificial Intell., Univ. of Szeged, Szeged, Hungary
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    190
  • Lastpage
    194
  • Abstract
    Convolutional neural networks have proved very successful in image recognition, thanks to their tolerance to small translations. They have recently been applied to speech recognition as well, using a spectral representation as input. However, in this case the translations along the two axes - time and frequency - should be handled quite differently. So far, most authors have focused on convolution along the frequency axis, which offers invariance to speaker and speaking style variations. Other researchers have developed a different network architecture that applies time-domain convolution in order to process a longer time-span of input in a hierarchical manner. These two approaches have different background motivations, and both offer significant gains over a standard fully connected network. Here we show that the two network architectures can be readily combined, like their advantages. With the combined model we report an error rate of 16.7% on the TIMIT phone recognition task, a new record on this dataset.
  • Keywords
    convolution; frequency-domain analysis; image recognition; neural nets; speaker recognition; telecommunication computing; time-domain analysis; TIMIT phone recognition task; convolutional neural network; error rate; frequency axis; frequency-domain convolution; image recognition; speaker invariance; speaking style variations; spectral representation; speech recognition; time-domain convolution; Biological neural networks; Convolution; Error analysis; Speech recognition; Time-frequency analysis; Training; Deep neural network; TIMIT; convolutional neural network; rectified linear unit; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6853584
  • Filename
    6853584