Elastic spectral distortion for low resource speech recognition with deep neural networks

Author

Kanda, Natsuki ; Takeda, Ryu ; Obuchi, Yasunari

Author_Institution

Central Res. Lab., Hitachi Ltd., Kokubunji, Japan

fYear

2013

fDate

8-12 Dec. 2013

Firstpage

309

Lastpage

314

Abstract

An acoustic model based on hidden Markov models with deep neural networks (DNN-HMM) has recently been proposed and achieved high recognition accuracy. In this paper, we investigated an elastic spectral distortion method to artificially augment training samples to help DNN-HMMs acquire enough robustness even when there are a limited number of training samples. We investigated three distortion methods - vocal tract length distortion, speech rate distortion, and frequency-axis random distortion - and evaluated those methods with Japanese lecture recordings. In a large vocabulary continuous speech recognition task with only 10 hours of training samples, a DNN-HMM trained with the elastic spectral distortion method achieved a 10.1% relative word error reduction compared with a normally trained DNN-HMM.

Keywords

acoustic signal processing; hidden Markov models; neural nets; spectral analysis; speech recognition; Japanese lecture recordings; acoustic model; artificially training sample augmentation; deep neural networks; elastic spectral distortion method; frequency-axis random distortion; hidden Markov models; low resource speech recognition; normally trained DNN-HMM; speech rate distortion; vocabulary continuous speech recognition task; vocal tract length distortion; Accuracy; Acoustic distortion; Acoustics; Hidden Markov models; Speech; Speech recognition; Training; Deep neural network; elastic distortion; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location

Olomouc

Type

conf

DOI

10.1109/ASRU.2013.6707748

Filename

6707748