Vocal Tract Normalization Equals Linear Transformation in Cepstral Space

Author

Pitz, Michael ; Ney, Hermann

Author_Institution

Lehrstuhl fur Informatik, RWTH Aachen Univ., Germany

Volume

13

Issue

5

fYear

2005

Firstpage

930

Lastpage

944

Abstract

Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a linear transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

Keywords

Jacobian matrices; cepstral analysis; speech recognition; statistical distributions; Jacobian determinant; automatic speech recognition systems; cepstral space; linear transformation; maximum likelihood linear regression; probability distributions; speaker normalization technique; transformation matrix; vocal tract normalization; Automatic speech recognition; Cepstral analysis; Distributed computing; Frequency; Human voice; Jacobian matrices; Loudspeakers; Maximum likelihood linear regression; Probability distribution; Speech recognition; Linear transformation; speaker adaptive modeling and training; speaker adaptive recognition; speech recognition; vocal tract (length) normalization;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/TSA.2005.848881

Filename

1495475