Title :
Vocal Tract Normalization Equals Linear Transformation in Cepstral Space
Author :
Pitz, Michael ; Ney, Hermann
Author_Institution :
Lehrstuhl fur Informatik, RWTH Aachen Univ., Germany
Abstract :
Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a linear transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.
Keywords :
Jacobian matrices; cepstral analysis; speech recognition; statistical distributions; Jacobian determinant; automatic speech recognition systems; cepstral space; linear transformation; maximum likelihood linear regression; probability distributions; speaker normalization technique; transformation matrix; vocal tract normalization; Automatic speech recognition; Cepstral analysis; Distributed computing; Frequency; Human voice; Jacobian matrices; Loudspeakers; Maximum likelihood linear regression; Probability distribution; Speech recognition; Linear transformation; speaker adaptive modeling and training; speaker adaptive recognition; speech recognition; vocal tract (length) normalization;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2005.848881