مرکز منطقه ای اطلاع رساني علوم و فناوري - A real-time Cantonese text-to-audiovisual speech synthesizer

DocumentCode :

417230

Title :

A real-time Cantonese text-to-audiovisual speech synthesizer

Author :

Wang, Jian-Qing ; Wong, Ka-Ho ; Heng, Pheng-Ann ; Meng, Helen M. ; Wong, Tien-Tsin

Author_Institution :

Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, China

Volume :

fYear :

2004

fDate :

17-21 May 2004

Abstract :

This paper describes the design and development of a Cantonese TTVS synthesizer, which can generate highly natural synthetic speech that is precisely time-synchronized with a real-time 3D face rendering. Our Cantonese TTVS synthesizer utilizes a homegrown Cantonese syllable-based concatenative text-to-speech system named CU VOCAL. This paper describes the extension of CU VOCAL to output syllable labels and durations that correspond to the output acoustic wave file. The syllables are decomposed and their initials/finals are mapped to the nearest IPA symbols that correspond to static viseme models. We have authored sixteen static viseme models together with two emotion-based face models. In order to achieve 3D face rendering, we have designed and implemented a blending technique that computes the linear combinations of the static face models to effect smooth transitions in between models. We demonstrate that this design and implementation of a TTVS synthesizer can achieve real-time performance in generation.

Keywords :

real-time systems; rendering (computer graphics); speech synthesis; synchronisation; CU VOCAL; Cantonese text-to-audiovisual speech; IPA symbols; TTVS synthesizer; acoustic wave file; blending technique; concatenative text-to-speech system; durations; emotion-based face models; highly natural synthetic speech; real-time 3D face rendering; real-time speech synthesizer; static viseme models; syllable labels; text-to-audiovisual speech synthesizer; time-synchronization; Design engineering; Facial animation; Financial advantage program; Head; Hidden Markov models; Real time systems; Speech synthesis; Synthesizers; Virtual reality; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN :

1520-6149

Print_ISBN :

0-7803-8484-9

Type :

conf

DOI :

10.1109/ICASSP.2004.1326070

Filename :

1326070

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=417230