• DocumentCode
    1685978
  • Title

    Asynchronous stochastic gradient descent for DNN training

  • Author

    Shanshan Zhang ; Ce Zhang ; Zhao You ; Rong Zheng ; Bo Xu

  • Author_Institution
    Interactive Digital Media Technol. Res. Center, Inst. of Autom., Beijing, China
  • fYear
    2013
  • Firstpage
    6660
  • Lastpage
    6663
  • Abstract
    It is well known that state-of-the-art speech recognition systems using deep neural network (DNN) can greatly improve the system performance compared with conventional GMM-HMM. However, what we have to pay correspondingly is the immense training cost due to the enormous parameters of DNN. Unfortunately, it is difficult to achieve parallelization of the minibatch-based back-propagation (BP) algorithm used in DNN training because of the frequent model updates. In this paper we describe an effective approach to achieve an approximation of BP - asynchronous stochastic gradient descent (ASGD), which is used to parallelize computing on multi-GPU. This approach manages multiple GPUs to work asynchronously to calculate gradients and update the global model parameters. Experimental results show that it achieves a 3.2 times speed-up on 4 GPUs than the single one, without any recognition performance loss.
  • Keywords
    backpropagation; gradient methods; neural nets; speech recognition; stochastic processes; DNN training; asynchronous stochastic gradient descent; backpropagation; global model parameters; multiGPU; speech recognition; Computational modeling; Data models; Graphics processing units; Servers; Speech recognition; Stochastic processes; Training; GPU parallelization; asynchronous SGD; deep neural network; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638950
  • Filename
    6638950