• DocumentCode
    672367
  • Title

    Hybrid acoustic models for distant and multichannel large vocabulary speech recognition

  • Author

    Swietojanski, Pawel ; Ghoshal, Arnab ; Renals, Steve

  • Author_Institution
    Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
  • fYear
    2013
  • fDate
    8-12 Dec. 2013
  • Firstpage
    285
  • Lastpage
    290
  • Abstract
    We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4-6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.
  • Keywords
    Gaussian processes; hidden Markov models; microphone arrays; neural nets; speech recognition; DNN; GMM; Gaussian mixture model; HMM; WER reduction; deep neural network; distant speech recognition; far field speech recognition; hidden Markov model; hybrid acoustic model; large vocabulary speech recognition; microphone array; multichannel speech recognition; single distant microphone; word error rate reduction; Acoustics; Array signal processing; Arrays; Microphones; Speech; Speech recognition; Training; Beamforming; Deep Neural Networks; Distant Speech Recognition; Meeting recognition; Microphone Arrays;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
  • Conference_Location
    Olomouc
  • Type

    conf

  • DOI
    10.1109/ASRU.2013.6707744
  • Filename
    6707744