• DocumentCode
    3744892
  • Title

    JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS

  • Author

    Vijayaditya Peddinti;Guoguo Chen;Vimal Manohar;Tom Ko;Daniel Povey;Sanjeev Khudanpur

  • Author_Institution
    Center for language and speech processing, The Johns Hopkins University, Baltimore, MD 21218, USA
  • fYear
    2015
  • Firstpage
    539
  • Lastpage
    546
  • Abstract
    Multi-style training, using data which emulates a variety of possible test scenarios, is a popular approach towards robust acoustic modeling. However acoustic models capable of exploiting large amounts of training data in a comparatively short amount of training time are essential. In this paper we tackle the problem of reverberant speech recognition using 5500 hours of simulated reverberant data. We use time-delay neural network (TDNN) architecture, which is capable of tackling long-term interactions between speech and corrupting sources in reverberant environments. By sub-sampling the outputs at TDNN layers across time steps, training time is substantially reduced. Combining this with distributed-optimization we show that the TDNN can be trained in 3 days using up to 32 GPUs. Further, iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation. Finally, recurrent neural network language models are applied to the lattices to further improve the performance. Our system is shown to provide state-of-the-art results in the IARPA ASpIRE challenge, with 26.5% WER on the dev Jest set.
  • Keywords
    "Speech","Context","Training data","Neural networks","Training","Acoustics","Databases"
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
  • Type

    conf

  • DOI
    10.1109/ASRU.2015.7404842
  • Filename
    7404842