Temporal structure constrained transformation for speaker adaptation

Author

Choi, Eric H C ; Holter, Trym ; Epps, Julien ; Gopalakrishnan, Arm

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

We suggest that rather than modeling speaker mismatch as an affine transform of the entire feature vector, it can be modeled by an affine transform of the static coefficients with additional constraints imposed by the temporal relationships of the streams of coefficients. This results in the different streams sharing the same rotation matrix, and thus reduces the complexity and memory requirements for speaker adaptation, as well as minimizes the adaptation data requirements. We present the solution for the case where temporal structure constrained transforms (TSCT) are optimized using the maximum likelihood criterion. The experiments presented in the paper show that with the proposed approach, the same accuracy after adaptation for the Wall Street Journal (WSJ) task can be achieved by using only 60% of the total number of transformation parameters that it would require if conventional block-diagonal transformation is used. In addition, TSCT provides better recognition accuracy when there is only a very limited amount of adaptation data.

Keywords

computational complexity; matrix algebra; maximum likelihood estimation; speech recognition; transforms; Wall Street Journal task; adaptation data requirements; affine transform; block-diagonal transformation; coefficients; complexity reduction; maximum likelihood criterion; memory requirements reduction; recognition accuracy; rotation matrix; speaker adaptation; static coefficients; temporal relationships; temporal structure constrained transformation; temporal structure constrained transforms; transformation parameters; Australia; Automatic speech recognition; Constraint optimization; Error analysis; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Parameter estimation; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198843

Filename

1198843