DocumentCode
3734368
Title
Voice conversion using deep neural network in super-frame feature space
Author
Wei Ye;Yibiao Yu
Author_Institution
School of Electronic and Information Engineering, Soochow University, Suzhou, China
fYear
2015
Firstpage
465
Lastpage
468
Abstract
This paper presents a voice conversion technique using deep neural networks (DNNs) to map the spectral envelopes of a source speaker to that of a target speaker. Short-time spectral envelopes are represented by the linear predication cepstrum coefficients (LPCC) parameters, and neighbor frames are gathered to form super-frames. Then the powerful mapping ability of DNN which has a five-layer architecture consisting of three restricted Boltzmann machines (RBMs) was exploited to derive the spectral conversion function. A comparative study of voice conversion using a DNN model and the conventional Gaussian mixture model (GMM) is conducted. Experimental results show the speaker identification rate of conversion speech achieves 97.5% which is 0.8% higher than the performance of GMM method, and the value of average cepstrum distortion is 0.87 which is 5.4% higher than the performance of GMM method. ABX and MOS evaluations indicate that the conversion performance is better than the traditional GMM method under the parallel corpora condition.
Keywords
"Yttrium","Decision support systems","Training"
Publisher
ieee
Conference_Titel
Intelligent Control and Information Processing (ICICIP), 2015 Sixth International Conference on
Print_ISBN
978-1-4799-1715-0
Type
conf
DOI
10.1109/ICICIP.2015.7388216
Filename
7388216
Link To Document