DocumentCode :
2786748
Title :
Person tracking via audio and video fusion
Author :
D´Arca, E. ; Robertson, N.M. ; Hopgood, J.
Author_Institution :
Edinburgh Res. Partnership, Univ. of Edinburgh, Edinburgh, UK
fYear :
2012
fDate :
16-17 May 2012
Firstpage :
1
Lastpage :
6
Abstract :
In this paper we present a joint audio-video (AV) tracker which can track the active source between two freely moving persons speaking in turn to simulate a meeting scenario, but less constrained. Our tracker differs from existing work in that it requires only a small number of sensors, works when speaker is not close to the sensors and relies on simple, yet efficient, inference techniques in AV processing. The system uses audio and video measures of the target position on the ground plane to strengthen the single modality predictions that would be weak if taken on their own as occlusions, clutter, reverberations and speech pauses happen in the test environment. In particular, the inter-microphone signal delays and the target image locations are input to single modality Bayesian filters, whose proposed likelihoods are multiplied in a Kalman Filter to give the joint AV final estimation. Despite the low complexity of the system, results show that the multi-modal tracker does not fail, tolerating video occlusion and intermittent speech (within 50 cm of accuracy) in the context of a non-meeting scenario. The system evaluation is done both on single modality than multi-modality tracking, and the performance improvement given by the AV fusion is discussed and quantified i.e 24 % improvement on the audio tracker accuracy.
Keywords :
Kalman filters; audio signal processing; speech processing; target tracking; video signal processing; AV final estimation; AV tracker; Kalman Filter; audio fusion; clutter; ground plane; intermittent speech; joint audio-video tracker; multimodal tracker; person tracking; reverberations; single modality Bayesian filters; single modality predictions; target position; test environment; video fusion; AV occlusion; Kalman filtering; audio tracking; multimodal fusion; video tracking;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Data Fusion & Target Tracking Conference (DF&TT 2012): Algorithms & Applications, 9th IET
Conference_Location :
London
Electronic_ISBN :
978-1-84919-624-6
Type :
conf
DOI :
10.1049/cp.2012.0410
Filename :
6253627
Link To Document :
بازگشت