• DocumentCode
    48938
  • Title

    A Unified Trajectory Tiling Approach to High Quality Speech Rendering

  • Author

    Yao Qian ; Soong, Frank K. ; Zhi-Jie Yan

  • Author_Institution
    Microsoft Res. Asia, Beijing, China
  • Volume
    21
  • Issue
    2
  • fYear
    2013
  • fDate
    Feb. 2013
  • Firstpage
    280
  • Lastpage
    290
  • Abstract
    It is technically challenging to make a machine talk as naturally as a human so as to facilitate “frictionless” interactions between machine and human. We propose a trajectory tiling-based approach to high-quality speech rendering, where speech parameter trajectories, extracted from natural, processed, or synthesized speech, are used to guide the search for the best sequence of waveform “tiles” stored in a pre-recorded speech database. We test the proposed unified algorithm in both Text-To-Speech (TTS) synthesis and cross-lingual voice transformation applications. Experimental results show that the proposed trajectory tiling approach can render speech which is both natural and highly intelligible. The perceived high quality of rendered speech is also confirmed in both objective and subjective evaluations.
  • Keywords
    speech processing; speech synthesis; cross-lingual voice transformation; frictionless interactions; high quality speech rendering; machine talk; natural speech; pre-recorded speech database; processed speech; speech parameter trajectory; synthesized speech; text-to-speech synthesis; unified trajectory tiling; waveform tiles; Hidden Markov models; Rendering (computer graphics); Speech; Speech processing; Training data; Trajectory; Cross-lingual; speech synthesis; trajectory tiling; voice transformation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2221460
  • Filename
    6317143