• DocumentCode
    13833
  • Title

    Convolutional Neural Networks for Distant Speech Recognition

  • Author

    Swietojanski, Pawel ; Ghoshal, Arnab ; Renals, Steve

  • Author_Institution
    Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
  • Volume
    21
  • Issue
    9
  • fYear
    2014
  • fDate
    Sept. 2014
  • Firstpage
    1120
  • Lastpage
    1124
  • Abstract
    We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.
  • Keywords
    Gaussian processes; array signal processing; convolution; microphones; neural nets; signal representation; speech recognition; AMI meeting corpus; DNN models; GMM baseline; MDM; SDM; WER; beamformed signal input representation; channel-wise convolution; convolutional neural networks; cross-channel CNN training; deep neural network model; discriminatively trained Gaussian mixture model; large vocabulary distant speech recognition; multiple acoustic channels; multiple distant microphones; single distant microphone; two-way pooling; weight sharing approach; word error rate; Acoustics; Convolution; Hidden Markov models; Microphones; Neural networks; Speech recognition; Vectors; AMI corpus; convolutional neural networks; deep neural networks; distant speech recognition; meetings;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Letters, IEEE
  • Publisher
    ieee
  • ISSN
    1070-9908
  • Type

    jour

  • DOI
    10.1109/LSP.2014.2325781
  • Filename
    6819043