مرکز منطقه ای اطلاع رساني علوم و فناوري - HMM-based sequence-to-frame mapping for voice conversion

DocumentCode :

2791879

Title :

HMM-based sequence-to-frame mapping for voice conversion

Author :

Qiao, Yu ; Saito, Daisuke ; Minematsu, Nobuaki

Author_Institution :

Univ. of Tokyo, Tokyo, Japan

fYear :

2010

fDate :

14-19 March 2010

Firstpage :

4830

Lastpage :

4833

Abstract :

Voice conversion can be reduced to a problem to find a transformation function between the corresponding speech sequences of two speakers. Perhaps the most voice conversions methods are GMM-based statistical mapping methods. However, the classical GMM-based mapping is frame-to-frame, and cannot take account of the contextual information existing over a speech sequence. It is well known that HMM yields an efficient method to model the density of a whole speech sequence and has found great successes in speech recognition and synthesis. Inspired by this fact, this paper studies how to use HMM for voice conversion. We derive an HMM-based sequence-to-frame mapping function with statistical analysis. Different from previous HMM-based voice conversion methods that used forced alignment for segmentation and transform frames aligned to a state with its associated linear transformation, our method has a soft mapping function as a weighted summation of linear transformations. The weights are calculated as the HMM posterior probabilities of frames. We also propose and compare two methods to learn the parameters of our mapping functions, namely least square error estimation and maximum likelihood estimation. We carried out experiments to examine the proposed HMM-based method for voice conversion.

Keywords :

Gaussian processes; hidden Markov models; speech processing; GMM-based statistical mapping methods; Gaussian mixture model; HMM-based sequence-to-frame mapping; frame posterior probability; hidden Markov models; soft mapping function; speech recognition; speech sequence; speech synthesis; statistical analysis; voice conversion; Cepstral analysis; Hidden Markov models; Least squares approximation; Maximum likelihood estimation; Probability; Speech recognition; Speech synthesis; Statistical analysis; Vectors; Virtual colonoscopy; HMM; Voice conversion; sequence-to-frame mapping; speech synthesis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location :

Dallas, TX

ISSN :

1520-6149

Print_ISBN :

978-1-4244-4295-9

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2010.5495141

Filename :

5495141

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2791879