مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploiting sparseness in deep neural networks for large vocabulary speech recognition

DocumentCode :

3163556

Title :

Exploiting sparseness in deep neural networks for large vocabulary speech recognition

Author :

Yu, Dong ; Seide, Frank ; Li, Gang ; Deng, Li

Author_Institution :

Microsoft Res., Redmond, WA, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4409

Lastpage :

4412

Abstract :

Recently, we developed context-dependent deep neural network (DNN) hidden Markov models for large vocabulary speech recognition. While reducing errors by 33% compared to its discriminatively trained Gaussian-mixture counterpart on the switchboard benchmark task, DNN requires much more parameters. In this paper, we report our recent work on DNN for improved generalization, model size, and computation speed by exploiting parameter sparseness. We formulate the goal of enforcing sparseness as soft regularization and convex constraint optimization problems, and propose solutions under the stochastic gradient ascent setting. We also propose novel data structures to exploit the random sparseness patterns to reduce model size and computation time. The proposed solutions have been evaluated on the voice-search and switchboard datasets. They have decreased the number of nonzero connections to one third while reducing the error rate by 0.2-0.3% over the fully connected model on both datasets. The nonzero connections have been further reduced to only 12% and 19% on the two respective datasets without sacrificing speech recognition performance. Under these conditions we can reduce the model size to 18% and 29%, and computation to 14% and 23%, respectively, on these two datasets.

Keywords :

convex programming; gradient methods; neural nets; random processes; speech recognition; stochastic processes; vocabulary; convex constraint optimization problem; data structure; deep neural network; large vocabulary speech recognition; model size reduction; nonzero connection; parameter sparseness; random sparseness pattern; soft regularization; stochastic gradient ascent; switchboard dataset; voice-search dataset; Computational modeling; Data structures; Hidden Markov models; Indexes; Speech; Speech recognition; Training; deep belief networks; deep neural networks; sparseness; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288897

Filename :

6288897

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3163556