• DocumentCode
    134248
  • Title

    Rapid bayesian learning for recurrent neural network language model

  • Author

    Jen-Tzung Chien ; Yuan-Chu Ku ; Mou-Yue Huang

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    34
  • Lastpage
    38
  • Abstract
    This paper presents Bayesian learning for recurrent neural network language model (RNN-LM). Our goal is to regularize the RNN-LM by compensating for the randomness of the estimated model parameters which is characterized by a Gaussian prior. This model is not only constructed by training the synaptic weight parameters according to the maximum a posteriori criterion but also regularized by estimating the Gaussian hyper-parameter through the type 2 maximum likelihood. However, a critical issue in Bayesian RNN-LM is the heavy computation of Hessian matrix which is formed as the sum of a large amount of outer-products of high-dimensional gradient vectors. We present a rapid approximation to reduce the redundancy due to the curse of dimensionality and speed up the calculation by summing up only the salient outer-products. Experiments on 1B-Word Benchmark, Penn Treebank and World Street Journal corpora show that rapid Bayesian RNN-LM consistently improves the perplexity and word error rate in comparison with standard RNN-LM.
  • Keywords
    Bayes methods; Hessian matrices; gradient methods; learning (artificial intelligence); parameter estimation; recurrent neural nets; speech recognition; 1B-Word Benchmark; Gaussian hyper-parameter estimation; Hessian matrix; Penn Treebank; World Street Journal corpora; high-dimensional gradient vector; maximum a posteriori criterion; model parameter estimation; rapid Bayesian RNN-LM; rapid Bayesian learning; rapid approximation; recurrent neural network language model; synaptic weight parameters; type 2 maximum likelihood; word error rate; Approximation methods; Bayes methods; Computational modeling; Recurrent neural networks; Speech recognition; Training; Vectors; Bayesian learning; Hessian matrix; Recurrent neural network language model; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936640
  • Filename
    6936640