Title :
Statistical Approach for Voice Personality Transformation
Author_Institution :
Dept. of Electron. Eng., Konkuk Univ., Seoul
Abstract :
A voice transformation method which changes the source speaker´s utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker´s voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods
Keywords :
maximum likelihood estimation; probability; speech processing; LPC cepstrum; acoustical features; cross correlation probability; maximum likelihood estimation; probabilistic classification; probabilistic model; prosody modification; scaling excitation spectrum; spectral contours; speech signals; statistical approach; target speaker; time scale modification; voice personality transformation; Cepstrum; Feature extraction; Linear predictive coding; Loudspeakers; Maximum likelihood estimation; Natural languages; Signal processing; Speech synthesis; Testing; Transfer functions; Maximum likelihood (ML) estimation; prosody modification; voice conversion;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2006.876760