Feature Extraction and Representation for Distributed Multi-View Human Action Recognition

Author

Jiajia Luo ; Wei Wang ; Hairong Qi

Author_Institution

Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA

Volume

3

Issue

2

fYear

2013

fDate

Jun-13

Firstpage

145

Lastpage

154

Abstract

Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.

Keywords

feature extraction; gesture recognition; IXMAS; Mltp-hist; approximation error; base station; distributed camera networks; distributed multiview human action recognition; extracted features; feature descriptor; feature extraction; feature representation; histogram representation; homogeneous region; illumination change; multiview action dataset; noninformative 3D patches; probability model; real-time realization; single view recognition; sparse coding; sparse representation; Feature extraction; multiview human action recognition; object recognition; spare coding;

fLanguage

English

Journal_Title

Emerging and Selected Topics in Circuits and Systems, IEEE Journal on

Publisher

ieee

ISSN

2156-3357

Type

jour

DOI

10.1109/JETCAS.2013.2256824

Filename

6507587