• DocumentCode
    960130
  • Title

    Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space

  • Author

    Afify, Mohamed ; Siohan, Olivier

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights
  • Volume
    15
  • Issue
    5
  • fYear
    2007
  • fDate
    7/1/2007 12:00:00 AM
  • Firstpage
    1731
  • Lastpage
    1732
  • Abstract
    The bilinear transformation (BT) is used for vocal tract length normalization (VTLN) in speech recogniton systems. We prove two properties of the bilinear mapping that motivated the band-diagonal transform proposed in M. Afify and O. Siohan, (ldquoConstrained maximum likelihood linear regression for speaker adaptation,rdquo in Proc. ICSLP, Beijing, China, Oct. 2000.) This is in contrast to what is stated in M. Pitz and H. Ney, (ldquoVocal tract length normalization equals linear transformation in cepstral space,rdquo IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp 930-944, September 2005) that the transform of Afify and Siohan was motivated by empirical observations.
  • Keywords
    cepstral analysis; maximum likelihood estimation; regression analysis; speaker recognition; speech processing; audio processing; band-diagonal transform; bilinear transformation; cepstral space; linear transformation; maximum likelihood linear regression; speaker adaptation; speech processing; speech recogniton system; vocal tract length normalization; Adaptation model; Cepstral analysis; Equations; Frequency; Linear regression; Maximum likelihood linear regression; Natural languages; Speech processing; Speech recognition; Transforms; Maximum-likelihood linear regression; speaker adaptation; speech recognition; vocal tract length normalization;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2007.896653
  • Filename
    4244505