• DocumentCode
    180495
  • Title

    Using bidirectional associative memories for joint spectral envelope modeling in voice conversion

  • Author

    Li-Juan Liu ; Ling-Hui Chen ; Zhen-Hua Ling ; Li-Rong Dai

  • Author_Institution
    Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    7884
  • Lastpage
    7888
  • Abstract
    The spectral envelope is the most natural representation of speech signal. But in voice conversion, it is difficult to directly model the raw spectral envelope space, which is high dimensional and strongly cross-dimensional correlated, with conventional Gaussian distributions. Bidirectional associative memory (BAM) is a two-layer feedback neural network that can better model the cross-dimensional correlations in high dimensional vectors. In this paper, we propose to reformulate BAMs as Gaussian distributions in order to model the spectral envelope space. The parameters of BAMs are estimated using the contrastive divergence algorithm. The evaluations on likelihood show that BAMs have better modeling ability than Gaussians with diagonal covariance. And the subjective tests on voice conversion indicate that the performance of the proposed method is significantly improved comparing with the conventional GMM based method.
  • Keywords
    Gaussian distribution; correlation methods; recurrent neural nets; signal representation; speech recognition; telecommunication computing; BAM; bidirectional associative memories; contrastive divergence algorithm; conventional Gaussian distributions; cross-dimensional correlations; diagonal covariance; high dimensional vectors; joint spectral envelope; natural representation; raw spectral envelope space; speech signal; two-layer feedback neural network; voice conversion; Covariance matrices; Gaussian distribution; Hidden Markov models; Joints; Speech; Training; Vectors; Spectral envelope modeling; bidirectional associative memory; contrastive divergence; voice conversion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6855135
  • Filename
    6855135