DocumentCode :
2989873
Title :
Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device
Author :
Wai-Kim Leung ; Ka-Wa Yuen ; Ka-Ho Wong ; Meng, Hsiang-Yun
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Hong Kong, China
fYear :
2013
fDate :
2-5 Dec. 2013
Firstpage :
583
Lastpage :
588
Abstract :
We have developed distributed text-to-audiovisual-speech synthesizer (TTAVS) to support interactivity in computer-aided pronunciation training (CAPT) on a mobile platform. The TTAVS serves to generate audiovisual corrective feedback based on detected mispronunciations from the second language learner´s speech. Our approach encodes key visemes in SVG format that are compressed by GZIP and transmitted to the client, where the browser can perform real-time morphing to render the visual speech. We have also developed a TTAVS animation player that can play the audio and visual speech synchronously while enabling user controls in play/pause/resume. Evaluation shows that this newly proposed approach, vis-à-vis our original approach that involves generation of an Ogg video on the server-side which is streamed to the client, achieves a significant reduction (66%) in average size of the output files that are transmitted from the server to the client, reduction of (83%) in client waiting times, as well as preserve the quality of the image.
Keywords :
audio streaming; audio-visual systems; client-server systems; computer animation; computer based training; interactive video; mobile computing; natural language processing; rendering (computer graphics); speech synthesis; video streaming; GZIP; Ogg video generation; Ogg video streaming; SVG format compression; TTAVS animation player; audio speech rendering; audiovisual corrective feedback; browser; client waiting time reduction; client-server system; computer aided pronunciation training; distributed TTAVS; image quality preservation; interactive language learning; mispronunciation detection; mispronunciation generation; mobile device; play-pause-resume; real-time morphing; second language learner speech; text-to-audiovisual speech synthesis; user control; vis-à-vis; visual speech rendering; Animation; Generators; Servers; Speech; Speech synthesis; Streaming media; Visualization; computer aided-pronunciation training system (CAPT); language learning; visual speech synthesizer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on
Conference_Location :
Budapest
Print_ISBN :
978-1-4799-1543-9
Type :
conf
DOI :
10.1109/CogInfoCom.2013.6719170
Filename :
6719170
Link To Document :
بازگشت