• DocumentCode
    134205
  • Title

    Investigation of stochastic Hessian-Free optimization in Deep neural networks for speech recognition

  • Author

    Zhao You ; Bo Xu

  • Author_Institution
    Interactive Digital Media Technol. Res. Center, Inst. of Autom., Beijing, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    450
  • Lastpage
    453
  • Abstract
    Effective training of Deep neural networks (DNNs) has very important significance for the DNNs based speech recognition systems. Stochastic gradient descent (SGD) is the most popular method for training DNNs. SGD often provides the solutions that are well adapt to generalization on held-out data. Recently, Hessian Free (HF) optimization have proved another optional algorithm for training DNNs. HF can be used for solving the pathological tasks. Stochastic Hessian Free (SHF) is a variation of HF, which can combine the generalization advantages of stochastic gradient descent (SGD) with second-order information from Hessian Free. This paper focus on investigating the SHF algorithm for DNN training. We conduct this algorithm on 100 hours Mandarin Chinese recorded speech recognition task. The first experiment shows that choosing proper size of gradient and curvature minibatch results in less training time and good performance. Next, it is observed that the performance of SHF does not depend on the initial parameters. Further more, experimental results shows that SHF performs with comparable results with SGD but better than traditional HF. Finally, we find that additional performance improvement is obtained with a dropout algorithm.
  • Keywords
    gradient methods; natural language processing; neural nets; speech recognition; stochastic programming; DNN training; HF optimization; Hessian free optimization; Mandarin Chinese recorded speech recognition task; SGD; SHF algorithm; deep neural networks; generalization advantages; optional algorithm; pathological tasks; second-order information; speech recognition system; stochastic Hessian free; stochastic Hessian-Free optimization; stochastic gradient descent; Error analysis; Hafnium; Neural networks; Optimization; Speech recognition; Stochastic processes; Training; Deep neural networks; Dropout; Speech recognition; Stochastic Hessian-Free optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936597
  • Filename
    6936597