Title :
Convolutional Neural Networks for Distant Speech Recognition
Author :
Swietojanski, Pawel ; Ghoshal, Arnab ; Renals, Steve
Author_Institution :
Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
Abstract :
We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.
Keywords :
Gaussian processes; array signal processing; convolution; microphones; neural nets; signal representation; speech recognition; AMI meeting corpus; DNN models; GMM baseline; MDM; SDM; WER; beamformed signal input representation; channel-wise convolution; convolutional neural networks; cross-channel CNN training; deep neural network model; discriminatively trained Gaussian mixture model; large vocabulary distant speech recognition; multiple acoustic channels; multiple distant microphones; single distant microphone; two-way pooling; weight sharing approach; word error rate; Acoustics; Convolution; Hidden Markov models; Microphones; Neural networks; Speech recognition; Vectors; AMI corpus; convolutional neural networks; deep neural networks; distant speech recognition; meetings;
Journal_Title :
Signal Processing Letters, IEEE
DOI :
10.1109/LSP.2014.2325781