مرکز منطقه ای اطلاع رساني علوم و فناوري - Deep Head Pose: Gaze-Direction Estimation in Multimodal Video

DocumentCode :

3607174

Title :

Deep Head Pose: Gaze-Direction Estimation in Multimodal Video

Author :

Mukherjee, Sankha S. ; Robertson, Neil Martin

Author_Institution :

Visionlab, Heriot-Watt Univ., Edinburgh, UK

Volume :

Issue :

fYear :

2015

Firstpage :

2094

Lastpage :

2107

Abstract :

In this paper we present a convolutional neural network (CNN)-based model for human head pose estimation in low-resolution multi-modal RGB-D data. We pose the problem as one of classification of human gazing direction. We further fine-tune a regressor based on the learned deep classifier. Next we combine the two models (classification and regression) to estimate approximate regression confidence. We present state-of-the-art results in datasets that span the range of high-resolution human robot interaction (close up faces plus depth information) data to challenging low resolution outdoor surveillance data. We build upon our robust head-pose estimation and further introduce a new visual attention model to recover interaction with the environment . Using this probabilistic model, we show that many higher level scene understanding like human-human/scene interaction detection can be achieved. Our solution runs in real-time on commercial hardware.

Keywords :

approximation theory; estimation theory; feedforward neural nets; human-robot interaction; image classification; image colour analysis; learning (artificial intelligence); pose estimation; probability; regression analysis; robot vision; video surveillance; CNN-based model; approximate regression confidence estimation; convolutional neural network-based model; deep head pose; gaze-direction estimation; high-resolution human robot interaction; human gazing direction classification; human head pose estimation; human-human interaction detection; learned deep classifier; low resolution outdoor surveillance data; low-resolution multimodal RGB-D data; multimodal video; probabilistic model; scene interaction detection; visual attention model; Estimation; Head; Human computer interaction; Image resolution; Magnetic heads; Surveillance; Visualization; Convolutional neural networks (CNNs); RGB-D; deep learning; gaze direction; head-pose;

fLanguage :

English

Journal_Title :

Multimedia, IEEE Transactions on

Publisher :

ieee

ISSN :

1520-9210

Type :

jour

DOI :

10.1109/TMM.2015.2482819

Filename :

7279167

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3607174