• DocumentCode
    3585151
  • Title

    Comprehensive Voice Conversion Analysis Based on DGMM and Feature Combination

  • Author

    He Pan ; Yangjie Wei ; Nan Guan ; Yi Wang

  • Author_Institution
    Northeastern Univ., Shenyang, China
  • fYear
    2014
  • Firstpage
    159
  • Lastpage
    164
  • Abstract
    Voice conversion system modifies a speaker´s voice to be perceived as another speaker uttered, and now it is widely used in many real applications. However, most research only focuses on one aspect performance of voice conversion system, rare theoretical analysis and experimental comparison on the whole source-target speaker voice conversion process has been introduced. Therefore, in this paper, a comprehensive analysis on source-target speaker voice conversion is conducted based on three key steps, including acoustic features selection and extraction, voice conversion model construction, and target speech synthesis, and a complete and optimal source-target speaker voice conversion is proposed. First, a comprehensive feature combination form consisting of prosodic feature, spectrum parameter and spectral envelope characteristic, is proposed. Then, to void the discontinuity and spectrum distortion of a converted speech, DGMM (Dynamic Gaussian Mixture Model) considering dynamic information between frames is presented. Subsequently, for speech synthesis, STRAIGHT algorithm synthesizer with feature combination is modified. Finally, the objective contrast experiment shows that our new source-target voice conversion process achieves better performance than the conventional methods. In addition, the speaker recognition system is also used to evaluate the quality of converted speech, and experimental result shows that the converted speech has higher target speaker individuality and speech quality.
  • Keywords
    speaker recognition; speech processing; speech synthesis; DGMM; STRAIGHT algorithm synthesizer; acoustic features selection; aspect performance; comprehensive voice conversion analysis; converted speech; dynamic Gaussian mixture model; dynamic information; extraction; feature combination; optimal source-target speaker voice conversion; prosodic feature; source-target speaker voice conversion process; source-target voice conversion process; speaker individuality; speaker recognition system; speaker uttered; spectral envelope characteristic; spectrum distortion; spectrum parameter; speech quality; target speech synthesis; voice conversion model construction; voice conversion system; Acoustics; Feature extraction; Speaker recognition; Speech; Speech processing; Speech recognition; Vectors; DGMM; STRAIGHT synthesis; feature combination; speaker recognition; voice conversion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Modelling Symposium (AMS), 2014 8th Asia
  • Print_ISBN
    978-1-4799-6486-4
  • Type

    conf

  • DOI
    10.1109/AMS.2014.39
  • Filename
    7079292