DocumentCode
394307
Title
Temporal structure constrained transformation for speaker adaptation
Author
Choi, Eric H C ; Holter, Trym ; Epps, Julien ; Gopalakrishnan, Arm
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
We suggest that rather than modeling speaker mismatch as an affine transform of the entire feature vector, it can be modeled by an affine transform of the static coefficients with additional constraints imposed by the temporal relationships of the streams of coefficients. This results in the different streams sharing the same rotation matrix, and thus reduces the complexity and memory requirements for speaker adaptation, as well as minimizes the adaptation data requirements. We present the solution for the case where temporal structure constrained transforms (TSCT) are optimized using the maximum likelihood criterion. The experiments presented in the paper show that with the proposed approach, the same accuracy after adaptation for the Wall Street Journal (WSJ) task can be achieved by using only 60% of the total number of transformation parameters that it would require if conventional block-diagonal transformation is used. In addition, TSCT provides better recognition accuracy when there is only a very limited amount of adaptation data.
Keywords
computational complexity; matrix algebra; maximum likelihood estimation; speech recognition; transforms; Wall Street Journal task; adaptation data requirements; affine transform; block-diagonal transformation; coefficients; complexity reduction; maximum likelihood criterion; memory requirements reduction; recognition accuracy; rotation matrix; speaker adaptation; static coefficients; temporal relationships; temporal structure constrained transformation; temporal structure constrained transforms; transformation parameters; Australia; Automatic speech recognition; Constraint optimization; Error analysis; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Parameter estimation; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198843
Filename
1198843
Link To Document