Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model

Author

Myung Jong Kim ; Younggwan Kim ; Hoirin Kim

Author_Institution

Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea

Volume

23

Issue

4

fYear

2015

fDate

Apr-15

Firstpage

694

Lastpage

704

Abstract

This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. The proposed method consists of two main steps: feature representation and prediction. In the feature representation step, the speech utterance is converted into a phone sequence using an automatic speech recognition technique and is then aligned with a canonical phone sequence from a pronunciation dictionary using a weighted finite state transducer to capture the pronunciation mappings such as match, substitution, and deletion. The histograms of the pronunciation mappings on a pre-defined word set are used for features. Next, in the prediction step, a structured sparse linear model incorporated with phonological knowledge that simultaneously addresses phonologically structured sparse feature selection and intelligibility prediction is proposed. Evaluation of the proposed method on a database of 109 speakers consisting of 94 dysarthric and 15 control speakers yielded a root mean square error of 8.14 compared to subjectively rated scores in the range of 0 to 100. This is a promising performance in which the system can be successfully applied to help speech therapists in diagnosing the degree of speech disorder.

Keywords

finite state machines; medical disorders; sparse matrices; speech; speech intelligibility; speech recognition; automatic dysarthric speech intelligibility assessment; automatic speech recognition technique; canonical phone sequence; intelligibility prediction; motor speech disorder; phonologically structured sparse feature selection; phonologically-structured sparse linear model; physical speech production; pronunciation dictionary; pronunciation mapping histograms; speech utterance; weighted finite state transducer; IEEE transactions; Medical treatment; Predictive models; Speech; Speech processing; Speech recognition; Transducers; Dysarthria; pronunciation confusion network; speech intelligibility assessment; structured sparse model; weighted finite state transducer (WFST);

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2015.2403619

Filename

7041211