Voice conversion based on improved GMM and spectrum with synchronous prosody

Author

Bing, Zhang ; Yibiao, Yu

Author_Institution

Sch. of Electron. & Inf. Eng., Soochow Univ., Suzhou

fYear

2008

Firstpage

659

Lastpage

662

Abstract

A new voice conversion approach is proposed based on improved GMM speaker model and short-time spectrum with synchronous prosody. Improved GMM speaker model which is trained by feature vector of original and target speaker can overcome over-smooth phenomenon. The short-time spectrum with prosody is composed of LSF parameter and pitch parameter. It can describe speakerpsilas vocal tract characteristics and exciting characteristics more accurately, comparing with normal methods which the pitch usually set as constant. Experimental results show this method can describe personality and transformation relationship of the source speaker and target speaker effectively. In addition, transformed speech has good quality, while speakerpsilas individuality transformed well.

Keywords

Gaussian processes; speaker recognition; spectral analysis; speech processing; GMM speaker model; Gaussian mixture model; feature vector; linear spectrum frequency; pitch parameter; speaker vocal tract; speech quality; synchronous prosody; voice conversion; Artificial neural networks; Feature extraction; Frequency; Hidden Markov models; Linear predictive coding; Linear regression; Loudspeakers; Speech analysis; Transfer functions; Vector quantization; Improved GMM; LSF; Spectrum with prosody; Voice conversion;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing, 2008. ICSP 2008. 9th International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-2178-7

Electronic_ISBN

978-1-4244-2179-4

Type

conf

DOI

10.1109/ICOSP.2008.4697217

Filename

4697217