• DocumentCode
    254113
  • Title

    DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition

  • Author

    Lin Sun ; Kui Jia ; Tsung-Han Chan ; Yuqiang Fang ; Gang Wang ; Shuicheng Yan

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Nat. Univ. of Singapore, Singapore, Singapore
  • fYear
    2014
  • fDate
    23-28 June 2014
  • Firstpage
    2625
  • Lastpage
    2632
  • Abstract
    Most of the previous work on video action recognition use complex hand-designed local features, such as SIFT, HOG and SURF, but these approaches are implemented sophisticatedly and difficult to be extended to other sensor modalities. Recent studies discover that there are no universally best hand-engineered features for all datasets, and learning features directly from the data may be more advantageous. One such endeavor is Slow Feature Analysis (SFA) proposed by Wiskott and Sejnowski [33]. SFA can learn the invariant and slowly varying features from input signals and has been proved to be valuable in human action recognition [34]. It is also observed that the multi-layer feature representation has succeeded remarkably in widespread machine learning applications. In this paper, we propose to combine SFA with deep learning techniques to learn hierarchical representations from the video data itself. Specifically, we use a two-layered SFA learning structure with 3D convolution and max pooling operations to scale up the method to large inputs and capture abstract and structural features from the video. Thus, the proposed method is suitable for action recognition. At the same time, sharing the same merits of deep learning, the proposed method is generic and fully automated. Our classification results on Hollywood2, KTH and UCF Sports are competitive with previously published results. To highlight some, on the KTH dataset, our recognition rate shows approximately 1% improvement in comparison to state-of-the-art methods even without supervision or dense sampling.
  • Keywords
    convolution; feature extraction; image recognition; image representation; learning (artificial intelligence); 3D convolution; DL-SFA; Hollywood2; KTH dataset; UCF Sports; complex hand-designed local features; deeply-learned slow feature analysis; hierarchical representations; human action recognition; invariant features; machine learning; max pooling operations; multilayer feature representation; sensor modalities; slowly varying features; video action recognition; video data; Abstracts; Convolution; Feature extraction; Kernel; Three-dimensional displays; Video sequences; Visualization; action recognition; deep learning; slow feature analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
  • Conference_Location
    Columbus, OH
  • Type

    conf

  • DOI
    10.1109/CVPR.2014.336
  • Filename
    6909732