مرکز منطقه ای اطلاع رساني علوم و فناوري - A distributed architecture for fast SGD sequence discriminative training of DNN acoustic models

DocumentCode :

3585024

Title :

A distributed architecture for fast SGD sequence discriminative training of DNN acoustic models

Author :

Saon, George

Author_Institution :

IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

fYear :

2014

Firstpage :

183

Lastpage :

188

Abstract :

We describe a hybrid GPU/CPU architecture for stochastic gradient descent training of neural network acoustic models under a lattice-based minimum Bayes risk (MBR) criterion. The crux of the method is to run SGD on a GPU card which consumes frame-randomized mini-batches produced by multiple workers running on a cluster of multi-core CPU nodes which compute HMM state MBR occupancies. To minimize communication cost, a separate thread running on the GPU host receives minibatches from and sends updated models to the workers, and communicates with the SGD thread via a producer-consumer queue of minibatches. Using this architecture, it is possible to match the speed of GPU-based SGD cross-entropy (CE) training (1 hour of processing per 100 hours of audio on Switchboard). Additionally, we compare different ways of doing frame randomization and discuss experimental results on three LVCSR tasks (Switchboard 300 hours, English broadcast news 50 hours, and noisy Levantine telephone conversations 300 hours).

Keywords :

Bayes methods; acoustic signal processing; entropy; gradient methods; graphics processing units; hidden Markov models; learning (artificial intelligence); multi-threading; multiprocessing systems; neural nets; CE training; CPU architecture; DNN acoustic models; English broadcast news; GPU architecture; GPU card; GPU host; GPU-based SGD cross-entropy training; HMM state MBR occupancies; LVCSR tasks; MBR criterion; SGD sequence discriminative training; SGD thread; Switchboard task; communication cost minimization; distributed architecture; frame randomization; frame-randomized minibatches; lattice-based minimum Bayes risk criterion; multicore CPU node cluster; neural network acoustic models; noisy Levantine telephone conversations; producer-consumer queue; stochastic gradient descent training; thread running; Abstracts; Hidden Markov models; Instruction sets; Robustness; Training; neural network acoustic models; sequence discriminative training; stochastic gradient descent;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language Technology Workshop (SLT), 2014 IEEE

Type :

conf

DOI :

10.1109/SLT.2014.7078571

Filename :

7078571

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3585024