• DocumentCode
    179897
  • Title

    Real-time one-pass decoding with recurrent neural network language model for speech recognition

  • Author

    Hori, Toshikazu ; Kubo, Yuji ; Nakamura, A.

  • Author_Institution
    NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    6364
  • Lastpage
    6368
  • Abstract
    This paper proposes an efficient one-pass decoding method for realtime speech recognition employing a recurrent neural network language model (RNNLM). An RNNLM is an effective language model that yields a large gain in recognition accuracy when it is combined with a standard n-gram model. However, since every word probability distribution based on an RNNLM is dependent on the entire history from the beginning of the speech, the search space in Viterbi decoding grows exponentially with the length of the recognition hypotheses and makes computation prohibitively expensive. Therefore, an RNNLM is usually used by N-best rescoring or by approximating it to a back-off n-gram model. In this paper, we present another approach that enables one-pass Viterbi decoding with an RNNLM without approximation, where the RNNLM is represented as a prefix tree of possible word sequences, and only the part needed for decoding is generated on-the-fly and used to rescore each hypothesis using an on-the-fly composition technique we previously proposed. Experimental results on the MIT lecture transcription task show that our proposed method enables one-pass decoding with small overhead for the RNNLM and achieves a slightly higher accuracy than 1000-best rescoring. Furthermore, it reduces the latency from the end of each utterance in two-pass decoding by a factor of 10.
  • Keywords
    decoding; recurrent neural nets; speech coding; speech recognition; statistical distributions; MIT lecture transcription task; N-best rescoring; RNNLM; back-off n-gram model; on-the-fly composition technique; one-pass Viterbi decoding; real-time one-pass decoding method; real-time speech recognition; recognition hypotheses; recurrent neural network language model; word probability distribution; Computational modeling; Decoding; Recurrent neural networks; Speech; Speech recognition; Transducers; Vectors; On-the-fly rescoring; Recurrent neural network language model; Speech recognition; Weighted finite-state transducer;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854829
  • Filename
    6854829