Title :
Using viseme based acoustic models for speech driven lip synthesis
Author :
Verma, A. ; Rajput, Nitendra ; Subramaniam, L.V.
Author_Institution :
IBM India Res. Lab., Indian Inst. of Technol., New Delhi, India
Abstract :
Speech drive lip synthesis is an interesting and important step toward human-computer interaction. An incoming speech signal is time aligned using a speech recognizer to generate phonetic sequence, which is then converted to corresponding viseme sequence to be animated. In this paper, we present a novel method for generation of the viseme sequence, which uses viseme based acoustic models, instead of usual phone based acoustic models, to align the input speech signal. This results in higher accuracy and speed of the alignment procedure and allows a much simpler implementation of the speech driven lip synthesis system as it completely obviates the requirement of acoustic unit to visual unit conversion. We show through various experiments that the proposed method results in about 53% relative improvement in classification accuracy and about 52% reduction in time, required to compute alignments.
Keywords :
acoustic signal processing; human computer interaction; image sequences; speech processing; speech recognition; speech synthesis; acoustic unit to visual unit conversion; human-computer interaction; incoming speech signal; phone based acoustic models; phonetic sequence; speech driven lip synthesis; speech recognizer; viseme based acoustic models; Animation; Hidden Markov models; Humans; Image databases; Image segmentation; Neural networks; Signal generators; Signal synthesis; Speech recognition; Speech synthesis;
Conference_Titel :
Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
Print_ISBN :
0-7803-7965-9
DOI :
10.1109/ICME.2003.1221366