مرکز منطقه ای اطلاع رساني علوم و فناوري - Dysarthric speech recognition using a convolutive bottleneck network

DocumentCode :

231554

Title :

Dysarthric speech recognition using a convolutive bottleneck network

Author :

Nakashika, Toru ; Yoshioka, Takashi ; Takiguchi, Tetsuya ; Ariki, Yasuo ; Duffner, Stefan ; Garcia, Christophe

Author_Institution :

Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan

fYear :

2014

fDate :

19-23 Oct. 2014

Firstpage :

505

Lastpage :

509

Abstract :

In this paper, we investigate the recognition of speech produced by a person with an articulation disorder resulting from athetoid cerebral palsy. The articulation of the first spoken words tends to become unstable due to strain on speech muscles, and that causes a degradation of traditional speech recognition systems. Therefore, we propose a robust feature extraction method using a convolutive bottleneck network (CBN) instead of the well-known MFCC. The CBN stacks multiple various types of layers, such as a convolution layer, a subsampling layer, and a bottleneck layer, forming a deep network. Applying the CBN to feature extraction for dysarthric speech, we expect that the CBN will reduce the influence of the unstable speaking style caused by the athetoid symptoms. We confirmed its effectiveness through word-recognition experiments, where the CBN-based feature extraction method outperformed the conventional feature extraction method.

Keywords :

convolution; feature extraction; medical disorders; muscle; speech recognition; articulation disorder; athetoid cerebral palsy; bottleneck layer; convolution layer; convolutive bottleneck network; deep network; dysarthric speech recognition; feature extraction method; speech muscles strain; speech recognition systems; subsampling layer; unstable speaking style; word-recognition experiments; Accuracy; Convolution; Feature extraction; Mel frequency cepstral coefficient; Robustness; Speech; Speech recognition; Articulation disorders; bottleneck feature; convolutional neural network; dysarthric speech; feature extraction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing (ICSP), 2014 12th International Conference on

Conference_Location :

Hangzhou

ISSN :

2164-5221

Print_ISBN :

978-1-4799-2188-1

Type :

conf

DOI :

10.1109/ICOSP.2014.7015056

Filename :

7015056

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=231554