DocumentCode
730670
Title
Differentiable pooling for unsupervised speaker adaptation
Author
Swietojanski, Pawel ; Renals, Steve
Author_Institution
Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
fYear
2015
fDate
19-24 April 2015
Firstpage
4305
Lastpage
4309
Abstract
This paper proposes a differentiable pooling mechanism to perform model-based neural network speaker adaptation. The proposed technique learns a speaker-dependent combination of activations within pools of hidden units, was shown to work well unsupervised, and does not require speaker-adaptive training. We have conducted a set of experiments on the TED talks data, as used in the IWSLT evaluations. Our results indicate that the approach can reduce word error rates (WERs) on standard IWSLT test sets by about 5-11% relative compared to speaker-independent systems and was found complementary to the recently proposed learning hidden units contribution (LHUC) approach, reducing WER by 6-13% relative. Both methods were also found to work well when adapting with small amounts of unsupervised data - 10 seconds is able to decrease the WER by 5% relative compared to the baseline speaker independent system.
Keywords
loudspeakers; IWSLT evaluations; LHUC; TED; WER; differentiable pooling mechanism; learning hidden units contribution; model-based neural network speaker adaptation; speaker-independent systems; unsupervised speaker adaptation; word error rates; Adaptation models; Artificial neural networks; Lead; Training; Deep Neural Networks; Differentiable pooling; LHUC; Speaker Adaptation; TED;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178783
Filename
7178783
Link To Document