Sparse representation for frequency warping based voice conversion

Author

Xiaohai Tian ; Zhizheng Wu ; Siu Wa Lee ; Nguyen Quy Hy ; Eng Siong Chng ; Minghui Dong

Author_Institution

Sch. of Comput. Eng., Nanyang Technol. Univ. (NTU), Singapore, Singapore

fYear

2015

fDate

19-24 April 2015

Firstpage

4235

Lastpage

4239

Abstract

This paper presents a sparse representation framework for weighted frequency warping based voice conversion. In this method, a frame-dependent warping function and the corresponding spectral residual vector are first calculated for each source-target spectrum pair. At runtime conversion, a source spectrum is factorised as a linear combination of a set of source spectra in the training data. The linear combination weight matrix, which is constrained to be sparse, is used to interpolate the frame-dependent warping functions and spectral residual vectors. In this way, the proposed method not only avoids the statistical averaging caused by GMM but also preserves the high-resolution spectral details for high-quality converted speech. Experiments are conducted on the VOICES database. Both objective and subjective results confirmed the effectiveness of the proposed method. In particular, the spectral distortion dropped from 5.55 dB of the conventional frequency warping approach to 5.0 dB of the proposed method. Compare to the state-of-the-art GMM-based conversion with global variance (GV) enhancement, our method achieved 68.5 % in an AB preference test.

Keywords

Gaussian processes; interpolation; matrix algebra; speech processing; GMM; VOICES database; frame-dependent warping function; high-quality converted speech; linear combination weight matrix; source-target spectrum pair; sparse representation framework; spectral residual vector; weighted frequency warping based voice conversion; Dictionaries; Discrete Fourier transforms; Distortion; Frequency conversion; Spectrogram; Speech; Speech processing; Voice conversion; exemplar; frequency warping; residual compensation; sparse representation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178769

Filename

7178769