• DocumentCode
    64459
  • Title

    Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model

  • Author

    Myung Jong Kim ; Younggwan Kim ; Hoirin Kim

  • Author_Institution
    Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea
  • Volume
    23
  • Issue
    4
  • fYear
    2015
  • fDate
    Apr-15
  • Firstpage
    694
  • Lastpage
    704
  • Abstract
    This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. The proposed method consists of two main steps: feature representation and prediction. In the feature representation step, the speech utterance is converted into a phone sequence using an automatic speech recognition technique and is then aligned with a canonical phone sequence from a pronunciation dictionary using a weighted finite state transducer to capture the pronunciation mappings such as match, substitution, and deletion. The histograms of the pronunciation mappings on a pre-defined word set are used for features. Next, in the prediction step, a structured sparse linear model incorporated with phonological knowledge that simultaneously addresses phonologically structured sparse feature selection and intelligibility prediction is proposed. Evaluation of the proposed method on a database of 109 speakers consisting of 94 dysarthric and 15 control speakers yielded a root mean square error of 8.14 compared to subjectively rated scores in the range of 0 to 100. This is a promising performance in which the system can be successfully applied to help speech therapists in diagnosing the degree of speech disorder.
  • Keywords
    finite state machines; medical disorders; sparse matrices; speech; speech intelligibility; speech recognition; automatic dysarthric speech intelligibility assessment; automatic speech recognition technique; canonical phone sequence; intelligibility prediction; motor speech disorder; phonologically structured sparse feature selection; phonologically-structured sparse linear model; physical speech production; pronunciation dictionary; pronunciation mapping histograms; speech utterance; weighted finite state transducer; IEEE transactions; Medical treatment; Predictive models; Speech; Speech processing; Speech recognition; Transducers; Dysarthria; pronunciation confusion network; speech intelligibility assessment; structured sparse model; weighted finite state transducer (WFST);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2403619
  • Filename
    7041211