DocumentCode
134207
Title
Pitch transformation in neural network based voice conversion
Author
Feng-Long Xie ; Yao Qian ; Soong, Frank K. ; Haifeng Li
Author_Institution
Harbin Inst. of Technol., Harbin, China
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
197
Lastpage
200
Abstract
In voice conversion task, prosody conversion especially pitch conversion is a very challenging research topic because of the discontinuity property of pitch. Conventionally pitch conversion is always achieved by adjusting the mean and variance of the source pitch distribution to the target pitch distribution. This method removes most of the detailed information of the speaker´s prosody and only maintains the global F0 contour. In this paper, we propose a neural network based pitch conversion system which converts F0 and spectral features all together frame by frame. Experimental results show that neural network based pitch conversion can significantly reduce the Unvoiced/Voiced error and RMSE of F0 between converted pitch and target pitch compared with the conventional Gaussian normalized transformation method. Wavelet decomposition for F0 can further improve the performance of voice conversion.
Keywords
neural nets; speech processing; statistical analysis; wavelet transforms; Gaussian normalized transformation method; RMSE; global F0 contour; mean; neural network based voice conversion; pitch conversion; pitch discontinuity property; pitch distribution; pitch transformation; prosody conversion; root mean square error; spectral feature; variance; wavelet decomposition; Artificial neural networks; Context; Speech; Training; Vectors; Wavelet transforms; neural network; pitch; voice conversion;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936599
Filename
6936599
Link To Document