مرکز منطقه ای اطلاع رساني علوم و فناوري - Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device

DocumentCode :

2989873

Title :

Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device

Author :

Wai-Kim Leung ; Ka-Wa Yuen ; Ka-Ho Wong ; Meng, Hsiang-Yun

Author_Institution :

Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Hong Kong, China

fYear :

2013

fDate :

2-5 Dec. 2013

Firstpage :

583

Lastpage :

588

Abstract :

We have developed distributed text-to-audiovisual-speech synthesizer (TTAVS) to support interactivity in computer-aided pronunciation training (CAPT) on a mobile platform. The TTAVS serves to generate audiovisual corrective feedback based on detected mispronunciations from the second language learner´s speech. Our approach encodes key visemes in SVG format that are compressed by GZIP and transmitted to the client, where the browser can perform real-time morphing to render the visual speech. We have also developed a TTAVS animation player that can play the audio and visual speech synchronously while enabling user controls in play/pause/resume. Evaluation shows that this newly proposed approach, vis-à-vis our original approach that involves generation of an Ogg video on the server-side which is streamed to the client, achieves a significant reduction (66%) in average size of the output files that are transmitted from the server to the client, reduction of (83%) in client waiting times, as well as preserve the quality of the image.

Keywords :

audio streaming; audio-visual systems; client-server systems; computer animation; computer based training; interactive video; mobile computing; natural language processing; rendering (computer graphics); speech synthesis; video streaming; GZIP; Ogg video generation; Ogg video streaming; SVG format compression; TTAVS animation player; audio speech rendering; audiovisual corrective feedback; browser; client waiting time reduction; client-server system; computer aided pronunciation training; distributed TTAVS; image quality preservation; interactive language learning; mispronunciation detection; mispronunciation generation; mobile device; play-pause-resume; real-time morphing; second language learner speech; text-to-audiovisual speech synthesis; user control; vis-à-vis; visual speech rendering; Animation; Generators; Servers; Speech; Speech synthesis; Streaming media; Visualization; computer aided-pronunciation training system (CAPT); language learning; visual speech synthesizer;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on

Conference_Location :

Budapest

Print_ISBN :

978-1-4799-1543-9

Type :

conf

DOI :

10.1109/CogInfoCom.2013.6719170

Filename :

6719170

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2989873