Title :
Deep Head Pose: Gaze-Direction Estimation in Multimodal Video
Author :
Mukherjee, Sankha S. ; Robertson, Neil Martin
Author_Institution :
Visionlab, Heriot-Watt Univ., Edinburgh, UK
Abstract :
In this paper we present a convolutional neural network (CNN)-based model for human head pose estimation in low-resolution multi-modal RGB-D data. We pose the problem as one of classification of human gazing direction. We further fine-tune a regressor based on the learned deep classifier. Next we combine the two models (classification and regression) to estimate approximate regression confidence. We present state-of-the-art results in datasets that span the range of high-resolution human robot interaction (close up faces plus depth information) data to challenging low resolution outdoor surveillance data. We build upon our robust head-pose estimation and further introduce a new visual attention model to recover interaction with the environment . Using this probabilistic model, we show that many higher level scene understanding like human-human/scene interaction detection can be achieved. Our solution runs in real-time on commercial hardware.
Keywords :
approximation theory; estimation theory; feedforward neural nets; human-robot interaction; image classification; image colour analysis; learning (artificial intelligence); pose estimation; probability; regression analysis; robot vision; video surveillance; CNN-based model; approximate regression confidence estimation; convolutional neural network-based model; deep head pose; gaze-direction estimation; high-resolution human robot interaction; human gazing direction classification; human head pose estimation; human-human interaction detection; learned deep classifier; low resolution outdoor surveillance data; low-resolution multimodal RGB-D data; multimodal video; probabilistic model; scene interaction detection; visual attention model; Estimation; Head; Human computer interaction; Image resolution; Magnetic heads; Surveillance; Visualization; Convolutional neural networks (CNNs); RGB-D; deep learning; gaze direction; head-pose;
Journal_Title :
Multimedia, IEEE Transactions on
DOI :
10.1109/TMM.2015.2482819