Title :
Exploiting sparseness in deep neural networks for large vocabulary speech recognition
Author :
Yu, Dong ; Seide, Frank ; Li, Gang ; Deng, Li
Author_Institution :
Microsoft Res., Redmond, WA, USA
Abstract :
Recently, we developed context-dependent deep neural network (DNN) hidden Markov models for large vocabulary speech recognition. While reducing errors by 33% compared to its discriminatively trained Gaussian-mixture counterpart on the switchboard benchmark task, DNN requires much more parameters. In this paper, we report our recent work on DNN for improved generalization, model size, and computation speed by exploiting parameter sparseness. We formulate the goal of enforcing sparseness as soft regularization and convex constraint optimization problems, and propose solutions under the stochastic gradient ascent setting. We also propose novel data structures to exploit the random sparseness patterns to reduce model size and computation time. The proposed solutions have been evaluated on the voice-search and switchboard datasets. They have decreased the number of nonzero connections to one third while reducing the error rate by 0.2-0.3% over the fully connected model on both datasets. The nonzero connections have been further reduced to only 12% and 19% on the two respective datasets without sacrificing speech recognition performance. Under these conditions we can reduce the model size to 18% and 29%, and computation to 14% and 23%, respectively, on these two datasets.
Keywords :
convex programming; gradient methods; neural nets; random processes; speech recognition; stochastic processes; vocabulary; convex constraint optimization problem; data structure; deep neural network; large vocabulary speech recognition; model size reduction; nonzero connection; parameter sparseness; random sparseness pattern; soft regularization; stochastic gradient ascent; switchboard dataset; voice-search dataset; Computational modeling; Data structures; Hidden Markov models; Indexes; Speech; Speech recognition; Training; deep belief networks; deep neural networks; sparseness; speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288897