DocumentCode :
1224471
Title :
Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory
Author :
Toda, Tomoki ; Black, Alan W. ; Tokuda, Keiichi
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara
Volume :
15
Issue :
8
fYear :
2007
Firstpage :
2222
Lastpage :
2235
Abstract :
In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.
Keywords :
Gaussian processes; least mean squares methods; maximum likelihood estimation; probability; spectral analysis; speech processing; speech synthesis; Gaussian mixture model; joint probability density; maximum-likelihood estimation; minimum mean square error; spectral conversion method; spectral parameter trajectory; speech quality; speech synthesis; statistical modeling; voice conversion; Loudspeakers; Maximum likelihood estimation; Mean square error methods; Natural languages; Parameter estimation; Speech enhancement; Speech processing; Statistics; Training data; Virtual colonoscopy; Dynamic feature; Gaussian mixture model (GMM); global variance; maximum-likelihood estimation (MLE); voice conversion (VC);
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.907344
Filename :
4317579
Link To Document :
بازگشت