Title :
A conditional random field approach for audio-visual people diarization
Author :
Paul, Gay ; Elie, Khoury ; Sylvain, Meignier ; Jean-Marc, Odobez ; Paul, Deleglise
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
Abstract :
We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, automatically associate the faces and voices of people and determine when they appear or speak in the video. The contributions are twofolds. First, we formulate the problem within a novel CRF framework that simultaneously performs the AV association of voices and face clusters to build AV person models, and the joint segmentation of the audio and visual streams using a set of AV cues and their association strength. Secondly, we use for this AV association strength a score that does not only rely on lips activity, but also on contextual visual information (face size, position, number of detected faces,...) that leads to more reliable association measures. Experiments on 6 hours of broadcast data show that our framework is able to improve the AV-person diarization especially for speaker segments erroneously labeled in the mono-modal case.
Keywords :
audio signal processing; audio-visual systems; broadcast communication; multimedia communication; speech recognition; video signal processing; AV association; AV person models; CRF framework; audio streams segmentation; audio-visual people diarization; audio-visual person diarization; broadcast data; contextual visual information; speaker segments; visual streams segmentation; Biological system modeling; Data models; Error analysis; Lips; Optimization; TV; Visualization; Audiovisual; Conditional Random Field; diarization;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853569