• DocumentCode
    2954563
  • Title

    Parsing video events with goal inference and intent prediction

  • Author

    Pei, Mingtao ; Jia, Yunde ; Zhu, Song-Chun

  • Author_Institution
    Lab. of Intell. Inf. Technol., Beijing Inst. of Technol., Beijing, China
  • fYear
    2011
  • fDate
    6-13 Nov. 2011
  • Firstpage
    487
  • Lastpage
    494
  • Abstract
    In this paper, we present an event parsing algorithm based on Stochastic Context Sensitive Grammar (SCSG) for understanding events, inferring the goal of agents, and predicting their plausible intended actions. The SCSG represents the hierarchical compositions of events and the temporal relations between the sub-events. The alphabets of the SCSG are atomic actions which are defined by the poses of agents and their interactions with objects in the scene. The temporal relations are used to distinguish events with similar structures, interpolate missing portions of events, and are learned from the training data. In comparison with existing methods, our paper makes the following contributions. i) We define atomic actions by a set of relations based on the fluents of agents and their interactions with objects in the scene. ii) Our algorithm handles events insertion and multi-agent events, keeps all possible interpretations of the video to preserve the ambiguities, and achieves the globally optimal parsing solution in a Bayesian framework; iii) The algorithm infers the goal of the agents and predicts their intents by a top-down process; iv) The algorithm improves the detection of atomic actions by event contexts. We show satisfactory results of event recognition and atomic action detection on the data set we captured which contains 12 event categories in both indoor and outdoor videos.
  • Keywords
    Bayes methods; context-sensitive grammars; inference mechanisms; multi-agent systems; video signal processing; Bayesian framework; atomic actions; event insertion; event recognition; goal inference; indoor videos; intent prediction; multiagent events; optimal parsing solution; outdoor videos; stochastic context sensitive grammar; temporal relations; top-down process; video event parsing algorithm; Atomic clocks; Context; Grammar; Hidden Markov models; Portable computers; Prediction algorithms; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2011 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1550-5499
  • Print_ISBN
    978-1-4577-1101-5
  • Type

    conf

  • DOI
    10.1109/ICCV.2011.6126279
  • Filename
    6126279