DocumentCode :
2954563
Title :
Parsing video events with goal inference and intent prediction
Author :
Pei, Mingtao ; Jia, Yunde ; Zhu, Song-Chun
Author_Institution :
Lab. of Intell. Inf. Technol., Beijing Inst. of Technol., Beijing, China
fYear :
2011
fDate :
6-13 Nov. 2011
Firstpage :
487
Lastpage :
494
Abstract :
In this paper, we present an event parsing algorithm based on Stochastic Context Sensitive Grammar (SCSG) for understanding events, inferring the goal of agents, and predicting their plausible intended actions. The SCSG represents the hierarchical compositions of events and the temporal relations between the sub-events. The alphabets of the SCSG are atomic actions which are defined by the poses of agents and their interactions with objects in the scene. The temporal relations are used to distinguish events with similar structures, interpolate missing portions of events, and are learned from the training data. In comparison with existing methods, our paper makes the following contributions. i) We define atomic actions by a set of relations based on the fluents of agents and their interactions with objects in the scene. ii) Our algorithm handles events insertion and multi-agent events, keeps all possible interpretations of the video to preserve the ambiguities, and achieves the globally optimal parsing solution in a Bayesian framework; iii) The algorithm infers the goal of the agents and predicts their intents by a top-down process; iv) The algorithm improves the detection of atomic actions by event contexts. We show satisfactory results of event recognition and atomic action detection on the data set we captured which contains 12 event categories in both indoor and outdoor videos.
Keywords :
Bayes methods; context-sensitive grammars; inference mechanisms; multi-agent systems; video signal processing; Bayesian framework; atomic actions; event insertion; event recognition; goal inference; indoor videos; intent prediction; multiagent events; optimal parsing solution; outdoor videos; stochastic context sensitive grammar; temporal relations; top-down process; video event parsing algorithm; Atomic clocks; Context; Grammar; Hidden Markov models; Portable computers; Prediction algorithms; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision (ICCV), 2011 IEEE International Conference on
Conference_Location :
Barcelona
ISSN :
1550-5499
Print_ISBN :
978-1-4577-1101-5
Type :
conf
DOI :
10.1109/ICCV.2011.6126279
Filename :
6126279
Link To Document :
بازگشت