• DocumentCode
    38992
  • Title

    Learning Human Actions by Combining Global Dynamics and Local Appearance

  • Author

    Guan Luo ; Shuang Yang ; Guodong Tian ; Chunfeng Yuan ; Weiming Hu ; Maybank, Stephen J.

  • Author_Institution
    Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
  • Volume
    36
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 1 2014
  • Firstpage
    2466
  • Lastpage
    2482
  • Abstract
    In this paper, we address the problem of human action recognition through combining global temporal dynamics and local visual spatio-temporal appearance features. For this purpose, in the global temporal dimension, we propose to model the motion dynamics with robust linear dynamical systems (LDSs) and use the model parameters as motion descriptors. Since LDSs live in a non-Euclidean space and the descriptors are in non-vector form, we propose a shift invariant subspace angles based distance to measure the similarity between LDSs. In the local visual dimension, we construct curved spatio-temporal cuboids along the trajectories of densely sampled feature points and describe them using histograms of oriented gradients (HOG). The distance between motion sequences is computed with the Chi-Squared histogram distance in the bag-of-words framework. Finally we perform classification using the maximum margin distance learning method by combining the global dynamic distances and the local visual distances. We evaluate our approach for action recognition on five short clips data sets, namely Weizmann, KTH, UCF sports, Hollywood2 and UCF50, as well as three long continuous data sets, namely VIRAT, ADL and CRIM13. We show competitive results as compared with current state-of-the-art methods.
  • Keywords
    image classification; image sequences; learning (artificial intelligence); motion estimation; spatiotemporal phenomena; ADL data set; CRIM13 data set; Chi-Squared histogram distance; HOG; Hollywood2 data set; KTH data set; LDS; UCF sports data set; UCF50 data set; VIRAT data set; Weizmann data set; bag-of-words framework; classification; curved spatio-temporal cuboids; densely-sampled feature point trajectories; global dynamic distances; global temporal dimension; global temporal dynamics; histogram-of-oriented gradients; human action learning; human action recognition; local visual dimension; local visual distances; local visual spatio-temporal appearance features; long-continuous data sets; maximum margin distance learning method; model parameters; motion descriptors; motion dynamics; motion sequences; nonEuclidean space; nonvector descriptors; robust linear dynamical systems; shift invariant subspace angle-based distance; short-clip data sets; similarity measurement; Behavioral science; Computer aided instruction; Feature extraction; Hidden Markov models; Histograms; Human factors; Action recognition; distance learning; linear dynamical system; local spatio-temporal feature; non-vector descriptor;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2014.2329301
  • Filename
    6826537