DocumentCode :
672378
Title :
Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition
Author :
Kashiwagi, Y. ; Saito, Daisuke ; Minematsu, Nobuaki ; Hirose, Keikichi
Author_Institution :
Grad. Sch. of Eng., Univ. of Tokyo, Tokyo, Japan
fYear :
2013
fDate :
8-12 Dec. 2013
Firstpage :
350
Lastpage :
355
Abstract :
In this paper, we propose the use of deep neural networks to expand conventional methods of statistical feature enhancement based on piecewise linear transformation. Stereo-based piecewise linear compensation for environments (SPLICE), which is a powerful statistical approach for feature enhancement, models the probabilistic distribution of input noisy features as a mixture of Gaussians. However, soft assignment of an input vector to divided regions is sometimes done inadequately and the vector comes to go through inadequate conversion. Especially when conversion has to be linear, the conversion performance will be easily degraded. Feature enhancement using neural networks is another powerful approach which can directly model a non-linear relationship between noisy and clean feature spaces. In this case, however, it tends to suffer from over-fitting problems. In this paper, we attempt to mitigate this problem by reducing the number of model parameters to estimate. Our neural network is trained whose output layer is associated with the states in the clean feature space, not in the noisy feature space. This strategy makes the size of the output layer independent of the kind of a given noisy environment. Firstly, we characterize the distribution of clean features as a Gaussian mixture model and then, by using deep neural networks, estimate discriminatively the state in the clean space that an input noisy feature corresponds to. Experimental evaluations using the Aurora 2 dataset demonstrate that our proposed method has the best performance compared to conventional methods.
Keywords :
Gaussian noise; feature extraction; learning (artificial intelligence); mixture models; neural nets; piecewise linear techniques; speech enhancement; speech recognition; state estimation; statistical analysis; statistical distributions; Aurora 2 dataset; Gaussian mixture model; SPLICE; clean feature space; deep learning; deep neural networks; discriminative piecewise linear transformation; input noisy feature; input vector soft assignment; neural network training; noise robust automatic speech recognition; noisy feature space; over-fitting problem; parameter estimation; probabilistic distribution; state estimation; statistical approach; statistical feature enhancement; stereo-based piecewise linear compensation for environments; Neural networks; Noise measurement; Signal to noise ratio; Speech; Training; Vectors; Automatic speech recognition; Deep learning; Noise robustness; feature enhancement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
Type :
conf
DOI :
10.1109/ASRU.2013.6707755
Filename :
6707755
Link To Document :
بازگشت