Title :
Dysarthric vocal interfaces with minimal training data
Author :
Gemmeke, Jort F. ; Sehgal, Siddharth ; Cunningham, Stuart ; Van hamme, Hugo
Author_Institution :
ESAT-PSI, KU Leuven, Leuven, Belgium
Abstract :
Over the past decade, several speech-based electronic assistive technologies (EATs) have been developed that target users with dysarthric speech. These EATs include vocal command & control systems, but also voice-input voice-output communication aids (VIVOCAs). In these systems, the vocal interfaces are based on automatic speech recognition systems (ASR), but this approach requires much training data and detailed annotation. In this work we evaluate an alternative approach, which works by mining utterance-based representations of speech for recurrent acoustic patterns, with the goal of achieving usable recognition accuracies with less speaker-specific training data. Comparisons with a conventional ASR system on dysarthric speech databases show that the proposed approach offers a substantial reduction in the amount of training data needed to achieve the same recognition accuracies.
Keywords :
audio databases; data mining; speech recognition; ASR; EAT; VIVOCA; automatic speech recognition systems; dysarthric speech databases; dysarthric vocal interfaces; minimal training data; recurrent acoustic patterns; speaker-specific training data; speech-based electronic assistive technologies; utterance based speech representations mining; vocal command & control systems; vocal interfaces; voice-input voice-output communication aids; Abstracts; Accuracy; Computers; Films; Filter banks; Hidden Markov models; dysarthric speech; non-negative matrix factorisation; vocal user interface;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
DOI :
10.1109/SLT.2014.7078582