DocumentCode :
1247834
Title :
Prosody based audiovisual coanalysis for coverbal gesture recognition
Author :
Kettebekov, Sanshzar ; Yeasin, Mohammed ; Sharma, Rajeev
Volume :
7
Issue :
2
fYear :
2005
fDate :
4/1/2005 12:00:00 AM
Firstpage :
234
Lastpage :
242
Abstract :
Despite recent advances in vision-based gesture recognition, its applications remain largely limited to artificially defined and well-articulated gesture signs used for human-computer interaction. A key reason for this is the low recognition rates for "natural" gesticulation. Previous attempts of using speech cues to reduce error-proneness of visual classification have been mostly limited to keyword-gesture coanalysis. Such scheme inherits complexity and delays associated with natural language processing. This paper offers a novel "signal-level" perspective, where prosodic manifestations in speech and hand kinematics are considered as a basis for coanalyzing loosely coupled modalities. We present a computational framework for improving continuous gesture recognition based on two phenomena that capture voluntary (coarticulation) and involuntary (physiological) contributions of prosodic synchronization. Physiological constraints, manifested as signal interruptions during multimodal production, are exploited in an audiovisual feature integration framework using hidden Markov models. Coarticulation is analyzed using a Bayesian network of naive classifiers to explore alignment of intonationally prominent speech segments and hand kinematics. The efficacy of the proposed approach was demonstrated on a multimodal corpus created from the Weather Channel broadcast. Both schemas were found to contribute uniquely by reducing different error types, which subsequently improves the performance of continuous gesture recognition.
Keywords :
audio-visual systems; belief networks; gesture recognition; hidden Markov models; human computer interaction; image classification; image segmentation; physiology; speech recognition; Bayesian network; audiovisual coanalysis; coverbal gesture recognition; hidden Markov model; human-computer interaction; multimodal production; natural language processing; prosody; visual classification; Bayesian methods; Broadcasting; Computer science; Delay; Hidden Markov models; Human computer interaction; Kinematics; Natural language processing; Speech analysis; Speech recognition; Gesture recognition; human–computer interaction; multimodal fusion; prosody;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2004.840590
Filename :
1407896
Link To Document :
بازگشت