DocumentCode :
394307
Title :
Temporal structure constrained transformation for speaker adaptation
Author :
Choi, Eric H C ; Holter, Trym ; Epps, Julien ; Gopalakrishnan, Arm
Volume :
1
fYear :
2003
fDate :
6-10 April 2003
Abstract :
We suggest that rather than modeling speaker mismatch as an affine transform of the entire feature vector, it can be modeled by an affine transform of the static coefficients with additional constraints imposed by the temporal relationships of the streams of coefficients. This results in the different streams sharing the same rotation matrix, and thus reduces the complexity and memory requirements for speaker adaptation, as well as minimizes the adaptation data requirements. We present the solution for the case where temporal structure constrained transforms (TSCT) are optimized using the maximum likelihood criterion. The experiments presented in the paper show that with the proposed approach, the same accuracy after adaptation for the Wall Street Journal (WSJ) task can be achieved by using only 60% of the total number of transformation parameters that it would require if conventional block-diagonal transformation is used. In addition, TSCT provides better recognition accuracy when there is only a very limited amount of adaptation data.
Keywords :
computational complexity; matrix algebra; maximum likelihood estimation; speech recognition; transforms; Wall Street Journal task; adaptation data requirements; affine transform; block-diagonal transformation; coefficients; complexity reduction; maximum likelihood criterion; memory requirements reduction; recognition accuracy; rotation matrix; speaker adaptation; static coefficients; temporal relationships; temporal structure constrained transformation; temporal structure constrained transforms; transformation parameters; Australia; Automatic speech recognition; Constraint optimization; Error analysis; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Parameter estimation; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1198843
Filename :
1198843
Link To Document :
بازگشت