DocumentCode
1135488
Title
Vocal Tract Normalization Equals Linear Transformation in Cepstral Space
Author
Pitz, Michael ; Ney, Hermann
Author_Institution
Lehrstuhl fur Informatik, RWTH Aachen Univ., Germany
Volume
13
Issue
5
fYear
2005
Firstpage
930
Lastpage
944
Abstract
Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a linear transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.
Keywords
Jacobian matrices; cepstral analysis; speech recognition; statistical distributions; Jacobian determinant; automatic speech recognition systems; cepstral space; linear transformation; maximum likelihood linear regression; probability distributions; speaker normalization technique; transformation matrix; vocal tract normalization; Automatic speech recognition; Cepstral analysis; Distributed computing; Frequency; Human voice; Jacobian matrices; Loudspeakers; Maximum likelihood linear regression; Probability distribution; Speech recognition; Linear transformation; speaker adaptive modeling and training; speaker adaptive recognition; speech recognition; vocal tract (length) normalization;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/TSA.2005.848881
Filename
1495475
Link To Document