Detecting person presence in TV shows with linguistic and structural features

Author

Bechet, Frederic ; Favre, Benoit ; Damnati, Geraldine

Author_Institution

LIF, Aix Marseille Univ., Marseille, France

fYear

2012

fDate

25-30 March 2012

Firstpage

5077

Lastpage

5080

Abstract

Person detection and recognition in videos is a hard problem due to the intrinsic ambiguities of the sound and image channels and their interaction. Whatever method is used to extract person hypotheses from the audio or the image channels, person recognition in videos relies on a multimodal decision process that merges the different hypotheses produced in order to decide, for each frame, who is present in the video at the audio level, at the image level or at the content level (person mention in speech or inserted text boxes). In this framework the focus of this paper is to produce a list of person presence hypotheses from the audio channel of a video document only, to be used in addition to person presence detected at the image level by a multimodal fusion process. In this study we focus on the audio channel only, using two kinds of features: linguistic features corresponding to the way a person is mentioned by a speaker; structural features corresponding to the context of occurrence of a name in a show. We show that both sets of features are complementary and that good results can be achieved on a TV show corpus annotated with person presence labels.

Keywords

feature extraction; image fusion; image recognition; natural language processing; speech recognition; video signal processing; TV shows; audio channel; image channel ambiguity; linguistic features; multimodal decision process; multimodal fusion process; person hypotheses extraction; person presence detection; person recognition; sound channel ambiguity; structural features; video document; Face; Feature extraction; Pragmatics; Speech; Speech recognition; TV; Videos; Boosting; Identification of persons; Named Entity; Spoken Language Understanding;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6289062

Filename

6289062