DocumentCode
960130
Title
Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space
Author
Afify, Mohamed ; Siohan, Olivier
Author_Institution
IBM T. J. Watson Res. Center, Yorktown Heights
Volume
15
Issue
5
fYear
2007
fDate
7/1/2007 12:00:00 AM
Firstpage
1731
Lastpage
1732
Abstract
The bilinear transformation (BT) is used for vocal tract length normalization (VTLN) in speech recogniton systems. We prove two properties of the bilinear mapping that motivated the band-diagonal transform proposed in M. Afify and O. Siohan, (ldquoConstrained maximum likelihood linear regression for speaker adaptation,rdquo in Proc. ICSLP, Beijing, China, Oct. 2000.) This is in contrast to what is stated in M. Pitz and H. Ney, (ldquoVocal tract length normalization equals linear transformation in cepstral space,rdquo IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp 930-944, September 2005) that the transform of Afify and Siohan was motivated by empirical observations.
Keywords
cepstral analysis; maximum likelihood estimation; regression analysis; speaker recognition; speech processing; audio processing; band-diagonal transform; bilinear transformation; cepstral space; linear transformation; maximum likelihood linear regression; speaker adaptation; speech processing; speech recogniton system; vocal tract length normalization; Adaptation model; Cepstral analysis; Equations; Frequency; Linear regression; Maximum likelihood linear regression; Natural languages; Speech processing; Speech recognition; Transforms; Maximum-likelihood linear regression; speaker adaptation; speech recognition; vocal tract length normalization;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2007.896653
Filename
4244505
Link To Document