DocumentCode
3000692
Title
Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information
Author
Thanh-Son Phan ; Tu-Cuong Duong ; Anh-Tuan Dinh ; Tat-Thang Vu ; Chi-Mai Luong
Author_Institution
Fac. of Inf. Technol., Le Qui Don Tech. Univ., Hanoi, Vietnam
fYear
2013
fDate
10-13 Nov. 2013
Firstpage
276
Lastpage
281
Abstract
Natural-sounding synthesized speech is goal of HMM-based Text-to-Speech systems. Besides using context dependent tri-phone units from a large corpus speech database, many prosody features have been used in full-context labels to improve naturalness of HMM-based Vietnamese synthesizer. In the prosodic specification, tone, part-of-speech (POS) and intonation information are considered not as important as positional information. Context-dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, tone, intonation pattern, and segmental duration. In this paper, we propose decision tree questions that use context-dependent tones and investigate the impact of POS and intonation tagging on the naturalness of HMM-based voice. Experimental results show that our proposed method can improve naturalness of a HMM-based Vietnamese TTS through objective evaluation and MOS test.
Keywords
decision trees; hidden Markov models; natural language processing; speech synthesis; HMM-based Vietnamese TTS naturalness improvement; HMM-based Vietnamese speech synthesis; HMM-based text-to-speech systems; HMM-based voice; MOS test; POS; context dependent triphone units; context-dependent information; context-dependent tones; decision tree questions; full-context labels; hidden Markov models; intonation information; intonation pattern; intonation tagging; large corpus speech database; natural-sounding synthesized speech; objective evaluation; part-of-speech; pause; phoneme sequence; positional information; prosodic information; prosodic specification; prosody features; segmental duration; synthetic speech; Context; Databases; Decision trees; Hidden Markov models; Speech; Training; Vectors; HMM; HTS; Vietnamese Speech Synthesis; context-dependent; decision tree-based clustering; part-of-speech; prosodic information; tri-phone;
fLanguage
English
Publisher
ieee
Conference_Titel
Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on
Conference_Location
Hanoi
Print_ISBN
978-1-4799-1349-7
Type
conf
DOI
10.1109/RIVF.2013.6719907
Filename
6719907
Link To Document