DocumentCode :
672371
Title :
Elastic spectral distortion for low resource speech recognition with deep neural networks
Author :
Kanda, Natsuki ; Takeda, Ryu ; Obuchi, Yasunari
Author_Institution :
Central Res. Lab., Hitachi Ltd., Kokubunji, Japan
fYear :
2013
fDate :
8-12 Dec. 2013
Firstpage :
309
Lastpage :
314
Abstract :
An acoustic model based on hidden Markov models with deep neural networks (DNN-HMM) has recently been proposed and achieved high recognition accuracy. In this paper, we investigated an elastic spectral distortion method to artificially augment training samples to help DNN-HMMs acquire enough robustness even when there are a limited number of training samples. We investigated three distortion methods - vocal tract length distortion, speech rate distortion, and frequency-axis random distortion - and evaluated those methods with Japanese lecture recordings. In a large vocabulary continuous speech recognition task with only 10 hours of training samples, a DNN-HMM trained with the elastic spectral distortion method achieved a 10.1% relative word error reduction compared with a normally trained DNN-HMM.
Keywords :
acoustic signal processing; hidden Markov models; neural nets; spectral analysis; speech recognition; Japanese lecture recordings; acoustic model; artificially training sample augmentation; deep neural networks; elastic spectral distortion method; frequency-axis random distortion; hidden Markov models; low resource speech recognition; normally trained DNN-HMM; speech rate distortion; vocabulary continuous speech recognition task; vocal tract length distortion; Accuracy; Acoustic distortion; Acoustics; Hidden Markov models; Speech; Speech recognition; Training; Deep neural network; elastic distortion; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
Type :
conf
DOI :
10.1109/ASRU.2013.6707748
Filename :
6707748
Link To Document :
بازگشت