Title :
Asynchronous stochastic gradient descent for DNN training
Author :
Shanshan Zhang ; Ce Zhang ; Zhao You ; Rong Zheng ; Bo Xu
Author_Institution :
Interactive Digital Media Technol. Res. Center, Inst. of Autom., Beijing, China
Abstract :
It is well known that state-of-the-art speech recognition systems using deep neural network (DNN) can greatly improve the system performance compared with conventional GMM-HMM. However, what we have to pay correspondingly is the immense training cost due to the enormous parameters of DNN. Unfortunately, it is difficult to achieve parallelization of the minibatch-based back-propagation (BP) algorithm used in DNN training because of the frequent model updates. In this paper we describe an effective approach to achieve an approximation of BP - asynchronous stochastic gradient descent (ASGD), which is used to parallelize computing on multi-GPU. This approach manages multiple GPUs to work asynchronously to calculate gradients and update the global model parameters. Experimental results show that it achieves a 3.2 times speed-up on 4 GPUs than the single one, without any recognition performance loss.
Keywords :
backpropagation; gradient methods; neural nets; speech recognition; stochastic processes; DNN training; asynchronous stochastic gradient descent; backpropagation; global model parameters; multiGPU; speech recognition; Computational modeling; Data models; Graphics processing units; Servers; Speech recognition; Stochastic processes; Training; GPU parallelization; asynchronous SGD; deep neural network; speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6638950