Title :
Improving continuous gesture recognition with spoken prosody
Author :
Kettebekov, Sanshzar ; Yeasin, Mohammed ; Sharma, Rajeev
Author_Institution :
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Abstract :
Despite recent advances in gesture recognition, reliance on the visual signal alone to classify unrestricted continuous gesticulation is inherently error-prone. Since spontaneous gesticulation is mostly coverbal in nature, there have been some attempts of using speech cues to improve gesture recognition. Some attempts have been made in using speech cues to improve gesture recognition, e.g., keyword-gesture co-analysis. Use of such scheme is burdened by the complexity of natural language understanding. This paper offers a "signal-level" perspective by exploring prosodic phenomena of spontaneous gesture and speech co-production. We present a computational framework for improving continuous gesture recognition based on two phenomena that capture voluntary (co-articulation) and involuntary (physiological) contributions of prosodic synchronization. Physiological constraints, manifested as signal interruptions in multimodal production, are exploited in an audio-visual feature integration framework using hidden Markov models (HMMs). Co-articulation is analyzed using a Bayesian network of naive classifiers to explore alignment of intonationally prominent speech segments and hand kinematics. The efficacy of the proposed approach was demonstrated on a multimodal corpus created from the Weather Channel broadcast. Both schemas were found to contribute uniquely by reducing different error types, which subsequently improves the performance of continuous gesture recognition.
Keywords :
belief networks; computer vision; gesture recognition; hidden Markov models; speech recognition; speech synthesis; Bayesian network; HMM; Weather Channel broadcast; audio-visual feature integration; coarticulation analysis; continuous gesticulation; continuous gesture recognition; error type reduction; gesture-speech coproduction; hand kinematics; hidden Markov model; intonationally prominent speech segment; keyword-gesture coanalysis; multimodal corpus; multimodal production; naive classifier; natural language; physiological constraint; physiological contribution; prosodic phenomena; prosodic synchronization; signal interruption; signal-level perspective; speech cue; spoken prosody; spontaneous gesticulation; visual signal; Bayesian methods; Computer errors; Computer science; Hidden Markov models; Human computer interaction; Kinematics; Laboratories; Natural languages; Speech analysis; Speech recognition;
Conference_Titel :
Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
Print_ISBN :
0-7695-1900-8
DOI :
10.1109/CVPR.2003.1211404