• DocumentCode
    6993
  • Title

    Realistic Human Action Recognition With Multimodal Feature Selection and Fusion

  • Author

    Qiuxia Wu ; Zhiyong Wang ; Feiqi Deng ; Zheru Chi ; Feng, David Dagan

  • Author_Institution
    Sch. of Autom. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
  • Volume
    43
  • Issue
    4
  • fYear
    2013
  • fDate
    Jul-13
  • Firstpage
    875
  • Lastpage
    885
  • Abstract
    Although promising results have been achieved for human action recognition under well-controlled conditions, it is very challenging to recognize human actions in realistic scenarios due to increased difficulties such as dynamic backgrounds. In this paper, we propose to take multimodal (i.e., audiovisual) characteristics of realistic human action videos into account in human action recognition for the first time, since, in realistic scenarios, audio signals accompanying an action generally provide a cue to the nature of the action, such as phone ringing to answering the phone . In order to cope with diverse audio cues of an action in realistic scenarios, we propose to identify effective features from a large number of audio features with the generalized multiple kernel learning algorithm. The widely used space-time interest point descriptors are utilized as visual features, and a support vector machine is employed for both audio- and video-based classifications. At the final stage, fuzzy integral is utilized to fuse recognition results of both audio and visual modalities. Experimental results on the challenging Hollywood-2 Human Action data set demonstrate that the proposed approach is able to achieve better recognition performance improvement than that of integrating scene context. It is also discovered how audio context influences realistic action recognition from our comprehensive experiments.
  • Keywords
    fuzzy set theory; image fusion; image motion analysis; learning (artificial intelligence); support vector machines; video signal processing; audio features; audio modalities; dynamic backgrounds; fuzzy integral; human action videos; multimodal feature fusion; multimodal feature selection; multiple kernel learning algorithm; phone ringing; realistic human action recognition; support vector machine; visual modalities; Fuzzy integral; multimodal fusion; multiple kernel learning (MKL); realistic human action recognition;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics: Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2216
  • Type

    jour

  • DOI
    10.1109/TSMCA.2012.2226575
  • Filename
    6493474