Title :
Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model
Author :
Myung Jong Kim ; Younggwan Kim ; Hoirin Kim
Author_Institution :
Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
Abstract :
This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. The proposed method consists of two main steps: feature representation and prediction. In the feature representation step, the speech utterance is converted into a phone sequence using an automatic speech recognition technique and is then aligned with a canonical phone sequence from a pronunciation dictionary using a weighted finite state transducer to capture the pronunciation mappings such as match, substitution, and deletion. The histograms of the pronunciation mappings on a pre-defined word set are used for features. Next, in the prediction step, a structured sparse linear model incorporated with phonological knowledge that simultaneously addresses phonologically structured sparse feature selection and intelligibility prediction is proposed. Evaluation of the proposed method on a database of 109 speakers consisting of 94 dysarthric and 15 control speakers yielded a root mean square error of 8.14 compared to subjectively rated scores in the range of 0 to 100. This is a promising performance in which the system can be successfully applied to help speech therapists in diagnosing the degree of speech disorder.
Keywords :
finite state machines; medical disorders; sparse matrices; speech; speech intelligibility; speech recognition; automatic dysarthric speech intelligibility assessment; automatic speech recognition technique; canonical phone sequence; intelligibility prediction; motor speech disorder; phonologically structured sparse feature selection; phonologically-structured sparse linear model; physical speech production; pronunciation dictionary; pronunciation mapping histograms; speech utterance; weighted finite state transducer; IEEE transactions; Medical treatment; Predictive models; Speech; Speech processing; Speech recognition; Transducers; Dysarthria; pronunciation confusion network; speech intelligibility assessment; structured sparse model; weighted finite state transducer (WFST);
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2015.2403619