Title :
Voice characteristics conversion for HMM-based speech synthesis system
Author :
Masuko, Takashi ; Tokuda, Keiichi ; Kobayashi, Takao ; Imai, Satoshi
Author_Institution :
Precision & Intelligence Lab., Tokyo Inst. of Technol., Yokohama, Japan
Abstract :
We describe an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved by changing the HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied the maximum a posteriori estimation and vector field smoothing (MAP/VFS) algorithm to the phoneme HMMs. Using 5 or 8 sentences as adaptation data, speech samples synthesized from a set of adapted tied triphone HMMs, which have approximately 2,000 distributions, are judged to be closer to the target speaker by 79.7% or 90.6%, respectively, in an ABX listening test
Keywords :
hidden Markov models; maximum likelihood estimation; smoothing methods; speech processing; speech synthesis; ABX listening test; HMM based speech synthesis system; HMM parameters; MAP/VFS algorithm; adaptation data; adapted tied triphone HMM; distributions; maximum a posteriori estimation; phoneme HMM; sentences; speech samples; speech units; synthesized speech; target speaker; text to speech synthesis system; vector field smoothing; voice characteristics conversion; Cepstral analysis; Computer science; Data analysis; Electronic mail; Hidden Markov models; Laboratories; Spatial databases; Speech analysis; Speech synthesis; Testing;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
Print_ISBN :
0-8186-7919-0
DOI :
10.1109/ICASSP.1997.598807