• DocumentCode
    3000692
  • Title

    Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information

  • Author

    Thanh-Son Phan ; Tu-Cuong Duong ; Anh-Tuan Dinh ; Tat-Thang Vu ; Chi-Mai Luong

  • Author_Institution
    Fac. of Inf. Technol., Le Qui Don Tech. Univ., Hanoi, Vietnam
  • fYear
    2013
  • fDate
    10-13 Nov. 2013
  • Firstpage
    276
  • Lastpage
    281
  • Abstract
    Natural-sounding synthesized speech is goal of HMM-based Text-to-Speech systems. Besides using context dependent tri-phone units from a large corpus speech database, many prosody features have been used in full-context labels to improve naturalness of HMM-based Vietnamese synthesizer. In the prosodic specification, tone, part-of-speech (POS) and intonation information are considered not as important as positional information. Context-dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, tone, intonation pattern, and segmental duration. In this paper, we propose decision tree questions that use context-dependent tones and investigate the impact of POS and intonation tagging on the naturalness of HMM-based voice. Experimental results show that our proposed method can improve naturalness of a HMM-based Vietnamese TTS through objective evaluation and MOS test.
  • Keywords
    decision trees; hidden Markov models; natural language processing; speech synthesis; HMM-based Vietnamese TTS naturalness improvement; HMM-based Vietnamese speech synthesis; HMM-based text-to-speech systems; HMM-based voice; MOS test; POS; context dependent triphone units; context-dependent information; context-dependent tones; decision tree questions; full-context labels; hidden Markov models; intonation information; intonation pattern; intonation tagging; large corpus speech database; natural-sounding synthesized speech; objective evaluation; part-of-speech; pause; phoneme sequence; positional information; prosodic information; prosodic specification; prosody features; segmental duration; synthetic speech; Context; Databases; Decision trees; Hidden Markov models; Speech; Training; Vectors; HMM; HTS; Vietnamese Speech Synthesis; context-dependent; decision tree-based clustering; part-of-speech; prosodic information; tri-phone;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4799-1349-7
  • Type

    conf

  • DOI
    10.1109/RIVF.2013.6719907
  • Filename
    6719907