DocumentCode :
1843757
Title :
Tensor-based speaker space construction for arbitrary speaker conversion
Author :
Saito, Daisuke ; Minematsu, Nobuaki ; Hirose, Keikichi
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
Volume :
1
fYear :
2012
fDate :
21-25 Oct. 2012
Firstpage :
595
Lastpage :
598
Abstract :
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of speaker space. In voice conversion studies, realization of conversion from/to an arbitrary speaker´s voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
Keywords :
Gaussian processes; eigenvalues and eigenfunctions; matrix algebra; speaker recognition; speech synthesis; tensors; EV-GMM; EVC; GMM supervectors; arbitrary speaker conversion; eigen-supervectors; eigenvoice Gaussian mixture model; eigenvoice conversion; flexible control; high-dimensional vectors; mean vectors; speaker characteristics; speaker representation; tensor analysis; tensor representation; tensor-based speaker space construction; voice conversion; weight parameters; Gaussian mixture model; Tucker decomposition; Voice conversion; tensor analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing (ICSP), 2012 IEEE 11th International Conference on
Conference_Location :
Beijing
ISSN :
2164-5221
Print_ISBN :
978-1-4673-2196-9
Type :
conf
DOI :
10.1109/ICoSP.2012.6491558
Filename :
6491558
Link To Document :
بازگشت