DocumentCode :
3752152
Title :
Improving bottleneck features for automatic speech recognition using gammatone-based cochleagram and sparsity regularization
Author :
Chao Ma;Jun Qi;Dongmei Li;Runsheng Liu
Author_Institution :
Department of Electronic Engineering, Tsinghua University, Beijing, China, 100084
fYear :
2015
Firstpage :
63
Lastpage :
67
Abstract :
Bottleneck (BN) features, particularly based on deep structures of a neural network, have been successfully applied to Automatic Speech Recognition (ASR) tasks. This paper goes on the study of improving the BN features for ASR tasks by employing two different methods: (1) a Cochleagram generated by Gammatone filters as the input feature for a deep neural network; (2) imposing the sparsity regularization on the bottleneck layer to control the sparsity level of BN features by constraining the activations of the hidden units to be averagely inactive most of the time. Our experiments on the Wall Street Journal (WSJ) database demonstrate that the two approaches can deliver certain performance gains to BN features for ASR tasks. In addition, further experiments on the WSJ database from different noise levels show that the Cochleagram as input has better noise-robust performance than the commonly used Mel-scaled filterbank.
Keywords :
"Neural networks","Frequency modulation","Indexes","Cost function","Automatic speech recognition","Mel frequency cepstral coefficient"
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type :
conf
DOI :
10.1109/APSIPA.2015.7415401
Filename :
7415401
Link To Document :
بازگشت