Action detection in complex scenes with spatial and temporal ambiguities

Author

Hu, Yuxiao ; Cao, Liangliang ; Lv, Fengjun ; Yan, Shuicheng ; Gong, Yihong ; Huang, Thomas S.

Author_Institution

Dept. of ECE, UIUC, Singapore, Singapore

fYear

2009

fDate

Sept. 29 2009-Oct. 2 2009

Firstpage

128

Lastpage

135

Abstract

In this paper, we investigate the detection of semantic human actions in complex scenes. Unlike conventional action recognition in well-controlled environments, action detection in complex scenes suffers from cluttered backgrounds, heavy crowds, occluded bodies, and spatial-temporal boundary ambiguities caused by imperfect human detection and tracking. Conventional algorithms are likely to fail with such spatial-temporal ambiguities. In this work, the candidate regions of an action are treated as a bag of instances. Then a novel multiple-instance learning framework, named SMILE-SVM (Simulated annealing Multiple Instance LEarning Support Vector Machines), is presented for learning human action detector based on imprecise action locations. SMILE-SVM is extensively evaluated with satisfactory performances on two tasks: (1) human action detection on a public video action database with cluttered backgrounds, and (2) a real world problem of detecting whether the customers in a shopping mall show an intention to purchase the merchandise on shelf (even if they didn´t buy it eventually). In addition, the complementary nature of motion and appearance features in action detection are also validated, demonstrating a boosted performance in our experiments.

Keywords

object detection; semantic networks; simulated annealing; support vector machines; SMILE-SVM; cluttered backgrounds; human action detection; occluded bodies; real world problem; semantic human actions detection; simulated annealing multiple instance learning support vector machines; spatial-temporal boundary ambiguities; video action database; Computer vision; Detectors; Humans; Layout; Machine learning; Merchandise; Performance evaluation; Simulated annealing; Spatial databases; Support vector machines;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision, 2009 IEEE 12th International Conference on

Conference_Location

Kyoto

ISSN

1550-5499

Print_ISBN

978-1-4244-4420-5

Electronic_ISBN

1550-5499

Type

conf

DOI

10.1109/ICCV.2009.5459153

Filename

5459153