Title :
HMM-based sequence-to-frame mapping for voice conversion
Author :
Qiao, Yu ; Saito, Daisuke ; Minematsu, Nobuaki
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
Abstract :
Voice conversion can be reduced to a problem to find a transformation function between the corresponding speech sequences of two speakers. Perhaps the most voice conversions methods are GMM-based statistical mapping methods. However, the classical GMM-based mapping is frame-to-frame, and cannot take account of the contextual information existing over a speech sequence. It is well known that HMM yields an efficient method to model the density of a whole speech sequence and has found great successes in speech recognition and synthesis. Inspired by this fact, this paper studies how to use HMM for voice conversion. We derive an HMM-based sequence-to-frame mapping function with statistical analysis. Different from previous HMM-based voice conversion methods that used forced alignment for segmentation and transform frames aligned to a state with its associated linear transformation, our method has a soft mapping function as a weighted summation of linear transformations. The weights are calculated as the HMM posterior probabilities of frames. We also propose and compare two methods to learn the parameters of our mapping functions, namely least square error estimation and maximum likelihood estimation. We carried out experiments to examine the proposed HMM-based method for voice conversion.
Keywords :
Gaussian processes; hidden Markov models; speech processing; GMM-based statistical mapping methods; Gaussian mixture model; HMM-based sequence-to-frame mapping; frame posterior probability; hidden Markov models; soft mapping function; speech recognition; speech sequence; speech synthesis; statistical analysis; voice conversion; Cepstral analysis; Hidden Markov models; Least squares approximation; Maximum likelihood estimation; Probability; Speech recognition; Speech synthesis; Statistical analysis; Vectors; Virtual colonoscopy; HMM; Voice conversion; sequence-to-frame mapping; speech synthesis;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495141