DocumentCode :
3752082
Title :
A probabilistic interpretation for artificial neural network-based voice conversion
Author :
Hsin-Te Hwang;Yu Tsao;Hsin-Min Wang;Yih-Ru Wang;Sin-Horng Chen
Author_Institution :
Dept. of Electrical and Computer Engineering, National Chiao Tung University, Hsinchu, Taiwan
fYear :
2015
Firstpage :
552
Lastpage :
558
Abstract :
Voice conversion (VC) using artificial neural networks (ANNs) has shown its capability to produce better sound quality of the converted speech than that using Gaussian mixture model (GMM). Although ANN-based VC works reasonably well, there is still room for further improvement. One of the promising ways is to adopt the successful techniques in statistical model-based parameter generation (SMPG), such as trajectory-based mapping approaches that are originally designed for GMM-based VC and hidden Markov model (HMM)-based speech synthesis. This study presents a probabilistic interpretation for ANN-based VC. In this way, ANN-based VC can easily incorporate the successful techniques in SMPG. Experimental results demonstrate that the performance of ANN-based VC can be effectively improved by two trajectory-based mapping techniques (maximum likelihood parameter generation (MLPG) algorithm and maximum likelihood-based trajectory mapping considering global variance (referred to as MLGV)), compared to the conventional ANN-based VC with frame-based mapping and the GMM-based VC with the MLPG algorithm. Moreover, ANN-based VC with the trajectory-based mapping techniques can achieve comparable performance when compared to the state-of-the-art GMM-based VC with the MLGV algorithm.
Keywords :
"Artificial neural networks","Hidden Markov models","Speech","Linear programming","Training","Acoustics"
Publisher :
ieee
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type :
conf
DOI :
10.1109/APSIPA.2015.7415330
Filename :
7415330
Link To Document :
بازگشت