DocumentCode
672371
Title
Elastic spectral distortion for low resource speech recognition with deep neural networks
Author
Kanda, Natsuki ; Takeda, Ryu ; Obuchi, Yasunari
Author_Institution
Central Res. Lab., Hitachi Ltd., Kokubunji, Japan
fYear
2013
fDate
8-12 Dec. 2013
Firstpage
309
Lastpage
314
Abstract
An acoustic model based on hidden Markov models with deep neural networks (DNN-HMM) has recently been proposed and achieved high recognition accuracy. In this paper, we investigated an elastic spectral distortion method to artificially augment training samples to help DNN-HMMs acquire enough robustness even when there are a limited number of training samples. We investigated three distortion methods - vocal tract length distortion, speech rate distortion, and frequency-axis random distortion - and evaluated those methods with Japanese lecture recordings. In a large vocabulary continuous speech recognition task with only 10 hours of training samples, a DNN-HMM trained with the elastic spectral distortion method achieved a 10.1% relative word error reduction compared with a normally trained DNN-HMM.
Keywords
acoustic signal processing; hidden Markov models; neural nets; spectral analysis; speech recognition; Japanese lecture recordings; acoustic model; artificially training sample augmentation; deep neural networks; elastic spectral distortion method; frequency-axis random distortion; hidden Markov models; low resource speech recognition; normally trained DNN-HMM; speech rate distortion; vocabulary continuous speech recognition task; vocal tract length distortion; Accuracy; Acoustic distortion; Acoustics; Hidden Markov models; Speech; Speech recognition; Training; Deep neural network; elastic distortion; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location
Olomouc
Type
conf
DOI
10.1109/ASRU.2013.6707748
Filename
6707748
Link To Document