مرکز منطقه ای اطلاع رساني علوم و فناوري - Statistical Approach for Voice Personality Transformation

DocumentCode :

990070

Title :

Statistical Approach for Voice Personality Transformation

Author :

Lee, Ki-Seung

Author_Institution :

Dept. of Electron. Eng., Konkuk Univ., Seoul

Volume :

Issue :

fYear :

2007

Firstpage :

641

Lastpage :

651

Abstract :

A voice transformation method which changes the source speaker´s utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker´s voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods

Keywords :

maximum likelihood estimation; probability; speech processing; LPC cepstrum; acoustical features; cross correlation probability; maximum likelihood estimation; probabilistic classification; probabilistic model; prosody modification; scaling excitation spectrum; spectral contours; speech signals; statistical approach; target speaker; time scale modification; voice personality transformation; Cepstrum; Feature extraction; Linear predictive coding; Loudspeakers; Maximum likelihood estimation; Natural languages; Signal processing; Speech synthesis; Testing; Transfer functions; Maximum likelihood (ML) estimation; prosody modification; voice conversion;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2006.876760

Filename :

4067042

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=990070