Title :
Dysarthric speech recognition using a convolutive bottleneck network
Author :
Nakashika, Toru ; Yoshioka, Takashi ; Takiguchi, Tetsuya ; Ariki, Yasuo ; Duffner, Stefan ; Garcia, Christophe
Author_Institution :
Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan
Abstract :
In this paper, we investigate the recognition of speech produced by a person with an articulation disorder resulting from athetoid cerebral palsy. The articulation of the first spoken words tends to become unstable due to strain on speech muscles, and that causes a degradation of traditional speech recognition systems. Therefore, we propose a robust feature extraction method using a convolutive bottleneck network (CBN) instead of the well-known MFCC. The CBN stacks multiple various types of layers, such as a convolution layer, a subsampling layer, and a bottleneck layer, forming a deep network. Applying the CBN to feature extraction for dysarthric speech, we expect that the CBN will reduce the influence of the unstable speaking style caused by the athetoid symptoms. We confirmed its effectiveness through word-recognition experiments, where the CBN-based feature extraction method outperformed the conventional feature extraction method.
Keywords :
convolution; feature extraction; medical disorders; muscle; speech recognition; articulation disorder; athetoid cerebral palsy; bottleneck layer; convolution layer; convolutive bottleneck network; deep network; dysarthric speech recognition; feature extraction method; speech muscles strain; speech recognition systems; subsampling layer; unstable speaking style; word-recognition experiments; Accuracy; Convolution; Feature extraction; Mel frequency cepstral coefficient; Robustness; Speech; Speech recognition; Articulation disorders; bottleneck feature; convolutional neural network; dysarthric speech; feature extraction;
Conference_Titel :
Signal Processing (ICSP), 2014 12th International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4799-2188-1
DOI :
10.1109/ICOSP.2014.7015056