Investigation of stochastic Hessian-Free optimization in Deep neural networks for speech recognition

Author

Zhao You ; Bo Xu

Author_Institution

Interactive Digital Media Technol. Res. Center, Inst. of Autom., Beijing, China

fYear

2014

fDate

12-14 Sept. 2014

Firstpage

450

Lastpage

453

Abstract

Effective training of Deep neural networks (DNNs) has very important significance for the DNNs based speech recognition systems. Stochastic gradient descent (SGD) is the most popular method for training DNNs. SGD often provides the solutions that are well adapt to generalization on held-out data. Recently, Hessian Free (HF) optimization have proved another optional algorithm for training DNNs. HF can be used for solving the pathological tasks. Stochastic Hessian Free (SHF) is a variation of HF, which can combine the generalization advantages of stochastic gradient descent (SGD) with second-order information from Hessian Free. This paper focus on investigating the SHF algorithm for DNN training. We conduct this algorithm on 100 hours Mandarin Chinese recorded speech recognition task. The first experiment shows that choosing proper size of gradient and curvature minibatch results in less training time and good performance. Next, it is observed that the performance of SHF does not depend on the initial parameters. Further more, experimental results shows that SHF performs with comparable results with SGD but better than traditional HF. Finally, we find that additional performance improvement is obtained with a dropout algorithm.

Keywords

gradient methods; natural language processing; neural nets; speech recognition; stochastic programming; DNN training; HF optimization; Hessian free optimization; Mandarin Chinese recorded speech recognition task; SGD; SHF algorithm; deep neural networks; generalization advantages; optional algorithm; pathological tasks; second-order information; speech recognition system; stochastic Hessian free; stochastic Hessian-Free optimization; stochastic gradient descent; Error analysis; Hafnium; Neural networks; Optimization; Speech recognition; Stochastic processes; Training; Deep neural networks; Dropout; Speech recognition; Stochastic Hessian-Free optimization;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location

Singapore

Type

conf

DOI

10.1109/ISCSLP.2014.6936597

Filename

6936597