• DocumentCode
    730661
  • Title

    Sparse representation for frequency warping based voice conversion

  • Author

    Xiaohai Tian ; Zhizheng Wu ; Siu Wa Lee ; Nguyen Quy Hy ; Eng Siong Chng ; Minghui Dong

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ. (NTU), Singapore, Singapore
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4235
  • Lastpage
    4239
  • Abstract
    This paper presents a sparse representation framework for weighted frequency warping based voice conversion. In this method, a frame-dependent warping function and the corresponding spectral residual vector are first calculated for each source-target spectrum pair. At runtime conversion, a source spectrum is factorised as a linear combination of a set of source spectra in the training data. The linear combination weight matrix, which is constrained to be sparse, is used to interpolate the frame-dependent warping functions and spectral residual vectors. In this way, the proposed method not only avoids the statistical averaging caused by GMM but also preserves the high-resolution spectral details for high-quality converted speech. Experiments are conducted on the VOICES database. Both objective and subjective results confirmed the effectiveness of the proposed method. In particular, the spectral distortion dropped from 5.55 dB of the conventional frequency warping approach to 5.0 dB of the proposed method. Compare to the state-of-the-art GMM-based conversion with global variance (GV) enhancement, our method achieved 68.5 % in an AB preference test.
  • Keywords
    Gaussian processes; interpolation; matrix algebra; speech processing; GMM; VOICES database; frame-dependent warping function; high-quality converted speech; linear combination weight matrix; source-target spectrum pair; sparse representation framework; spectral residual vector; weighted frequency warping based voice conversion; Dictionaries; Discrete Fourier transforms; Distortion; Frequency conversion; Spectrogram; Speech; Speech processing; Voice conversion; exemplar; frequency warping; residual compensation; sparse representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178769
  • Filename
    7178769