Title :
Robust recognition of children´s speech
Author :
Potamianos, Alexandros ; Narayanan, Shrikanth
Author_Institution :
Dept. of Electron. & Comput. Eng., Tech. Univ. of Crete, Chania, Greece
Abstract :
Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children´s speech. Through an analysis of age-related acoustic characteristics of children´s speech in the context of automatic speech recognition (ASR), effects such as frequency scaling of spectral envelope parameters are demonstrated. Recognition experiments using acoustic models trained from adult speech and tested against speech from children of various ages clearly show performance degradation with decreasing age. On average, the word error rates are two to five times worse for children speech than for adult speech. Various techniques for improving ASR performance on children´s speech are reported. A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25-45% under various model training and testing conditions). The use of age-dependent acoustic models further reduces word error rate by 10%. The potential of using piece-wise linear and phoneme-dependent frequency warping algorithms for reducing the variability in the acoustic feature space of children is also investigated.
Keywords :
error analysis; piecewise linear techniques; spectral analysis; speech recognition; acoustic models; age-dependent spectral variability; age-dependent temporal variability; age-related acoustic characteristics; automatic speech recognition; children speech recognition; formant scaling; frequency scaling; frequency warping; phoneme-dependent algorithm; piecewise linear algorithm; speaker normalization algorithm; spectral envelope parameters; vocal tract normalization; word error rate; Acoustic testing; Automatic speech recognition; Degradation; Error analysis; Frequency; Loudspeakers; Piecewise linear techniques; Robustness; Speech analysis; Speech recognition;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2003.818026