Title :
Fast speaker adaptation of artificial neural networks for automatic speech recognition
Author :
Dupont, Stéphane ; Cheboub, Leila
Author_Institution :
TCTS-MULTITEL, Faculte Polytech. de Mons, Belgium
Abstract :
This paper presents a fast speaker adaptation technique dedicated to automatic speech recognition systems using artificial neural networks (ANNs) for hidden Markov models (HMMs) state probability estimation. Speaker-adapted ANNs are first obtained from the training data using affine transformations in the feature space. Similarly to the “eigenvoice” approach, principal components analysis (PCA) is then applied to these transformation matrices. The first few eigenvectors represent a small-dimensional space which captures most of the inter-speaker variability of the training set. During operation, these eigenvectors can be used to constrain the optimization of the transformation matrices for the new speakers. This optimization is performed using steepest descent with gradients obtained using backpropagation through the speaker independent ANN. We have been using state-of-the-art hybrid HMM/ANN systems trained on the Phonebook database. Supervised adaptation experiments with different amounts of data show better performance of this new technique compared to standard linear regression in the feature space: with only 20 words of adaptation data, results show a 15% relative decrease of the word error rate
Keywords :
backpropagation; eigenvalues and eigenfunctions; estimation theory; hidden Markov models; neural nets; optimisation; principal component analysis; probability; speech recognition; HMMs state probability estimation; Phonebook database; affine transformations; artificial neural networks; automatic speech recognition; backpropagation; eigenvectors; eigenvoice; fast speaker adaptation; gradients; hidden Markov models; inter-speaker variability; optimization; performance; principal components analysis; speaker independent ANN; steepest descent; training data; training set; transformation matrices; word error rate; Artificial neural networks; Automatic speech recognition; Backpropagation; Constraint optimization; Hidden Markov models; Linear regression; Principal component analysis; Spatial databases; State estimation; Training data;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.862102