Automatic selection of speakers for improved acoustic modelling: recognition of disordered speech with sparse data

Author

Christensen, H. ; Casanueva, I. ; Cunningham, S. ; Green, P. ; Hain, T.

Author_Institution

Dept. of Comput. Sci., Univ. of Sheffield, Sheffield, UK

fYear

2014

Firstpage

254

Lastpage

259

Abstract

The automatic recognition of disordered speech is a domain that is characterised by limited amounts of training data for each speaker and large intra- and inter-speaker variations. This paper is concerned with how best to train an acoustic models in these circumstances; in particular, we look at how to select data for a background model from a pool of speakers for a given target speaker. We show that rather than including data from all available speakers (the standard approach in the typical speech domain), significantly better accuracy can be achieved by carefully selecting which speakers should contribute. Different methods based on measuring acoustic closeness between speakers and ranking them accordingly are investigated, and on the UASpeech isolated word recognition task, we achieve a 11.5% relative improvement compared to the baseline which uses data from all speakers. Accuracies for speakers with moderate to severe impairments are shown to improve the most with one speaker classed as having `low´ intelligibility gaining a 60% relative improvement in accuracy.

Keywords

acoustic signal processing; handicapped aids; speaker recognition; acoustic modelling; automatic speaker selection; disordered speech recognition; dysarthric speech; physical disability; sparse data; Accuracy; Adaptation models; Data models; Silicon; Speech; Speech recognition; Training; recognition of dysarthric speech; spare data; speaker selection;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language Technology Workshop (SLT), 2014 IEEE

Type

conf

DOI

10.1109/SLT.2014.7078583

Filename

7078583