Title :
Factorized adaptation for deep neural network
Author :
Jinyu Li ; Jui-Ting Huang ; Yifan Gong
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Abstract :
In this paper, we propose a novel method to adapt context-dependent deep neural network hidden Markov model (CD-DNN-HMM) with only limited number of parameters by taking into account the underlying factors that contribute to the distorted speech signal. We derive this factorized adaptation method from the perspectives of joint factor analysis and vector Taylor series expansion, respectively. Evaluated on Aurora 4, the proposed method can get 19.0% and 10.6% relative word error rate reduction on test set B and D with only 20 adaptation utterances, and can have decent improvement with as few as two adaptation utterances. We also show that the proposed method is better than feature discriminative linear regression (fDLR), an existing DNN adaptation method. Its small number of parameters and short training time offer an attractive solution to low-footprint speech applications.
Keywords :
distortion; feature extraction; hidden Markov models; neural nets; regression analysis; series (mathematics); speech processing; Aurora 4; CD-DNN-HMM; adaptation utterances; context-dependent deep neural network hidden Markov model; fDLR; factorized adaptation; feature discriminative linear regression; joint factor analysis; low-footprint speech applications; speech signal distortion; vector Taylor series expansion; word error rate reduction; Acoustics; Hidden Markov models; Neural networks; Noise; Speech; Training; Vectors; deep neural network; factorized adaptation; joint factor analysis; vector Taylor series;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854662