• DocumentCode
    36280
  • Title

    Structured Output Layer Neural Network Language Models for Speech Recognition

  • Author

    Le, Hai-Son ; Oparin, Ilya ; Allauzen, Alexandre ; Gauvain, Jean-Luc ; Yvon, François

  • Author_Institution
    LIMSI, Univ. Paris-Sud, Orsay, France
  • Volume
    21
  • Issue
    1
  • fYear
    2013
  • fDate
    Jan. 2013
  • Firstpage
    197
  • Lastpage
    206
  • Abstract
    This paper extends a novel neural network language model (NNLM) which relies on word clustering to structure the output vocabulary: Structured OUtput Layer (SOUL) NNLM. This model is able to handle arbitrarily-sized vocabularies, hence dispensing with the need for shortlists that are commonly used in NNLMs. Several softmax layers replace the standard output layer in this model. The output structure depends on the word clustering which is based on the continuous word representation determined by the NNLM. Mandarin and Arabic data are used to evaluate the SOUL NNLM accuracy via speech-to-text experiments. Well tuned speech-to-text systems (with error rates around 10%) serve as the baselines. The SOUL model achieves consistent improvements over a classical shortlist NNLM both in terms of perplexity and recognition accuracy for these two languages that are quite different in terms of their internal structure and recognition vocabulary size. An enhanced training scheme is proposed that allows more data to be used at each training iteration of the neural network.
  • Keywords
    iterative methods; natural languages; neural nets; speech recognition; text analysis; Arabic data; Mandarin data; SOUL NNLM accuracy; arbitrarily-sized vocabularies; continuous word representation; speech recognition; speech-to-text experiments; speech-to-text systems; structured output layer NNLM; structured output layer neural network language model; training iteration; training scheme enhancement; Artificial neural networks; Computer architecture; Context; Standards; Training; Vocabulary; Automatic speech recognition; neural network language model; speech-to-text;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2215599
  • Filename
    6289355