• DocumentCode
    2365498
  • Title

    Frequency warping for speaker adaption of text-to-speech synthesis

  • Author

    Weixun Gao ; Qiying Cao

  • Author_Institution
    Sch. of Inf. Sci. & Technol., Donghua Univeristy, Shanghai, China
  • fYear
    2010
  • fDate
    26-29 Sept. 2010
  • Firstpage
    307
  • Lastpage
    310
  • Abstract
    Vocal tract length normalization (VTLN) is generally used in speech recognition for removing individual speaker characteristics. In this paper, we employ VTLN to speaker adaptation of speech synthesis. We propose a new frequency warping approach to reduce the spectrum distance between source and target speakers. The frequency warping function is based on a bilinear function and the warping factor is dynamically generated frame-by-frame. The warped spectra of source speaker are then converted to LSPs to train hidden Markov models (HMM). HMMs are further adapted by maximum likelihood linear regression (MLLR) with target speaker´s data. The experimental results show that our frequency warping approach can make the warped spectra of source speaker closer to target speaker and the resultant adapted HMMs have a better performance than the HMMs trained with unwarped spectra in term of voice naturalness and speaker similarity.
  • Keywords
    hidden Markov models; regression analysis; speech recognition; speech synthesis; bilinear function; frequency warping; hidden Markov models; maximum likelihood linear regression; speaker adaption; speaker similarity; spectrum distance; speech recognition; text-to-speech synthesis; vocal tract length normalization; voice naturalness; TTS; frequency warping; speaker adaptation;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Wireless, Mobile and Multimedia Networks (ICWMNN 2010), IET 3rd International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1049/cp.2010.0677
  • Filename
    5703015