• DocumentCode
    3167099
  • Title

    Detecting person presence in TV shows with linguistic and structural features

  • Author

    Bechet, Frederic ; Favre, Benoit ; Damnati, Geraldine

  • Author_Institution
    LIF, Aix Marseille Univ., Marseille, France
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5077
  • Lastpage
    5080
  • Abstract
    Person detection and recognition in videos is a hard problem due to the intrinsic ambiguities of the sound and image channels and their interaction. Whatever method is used to extract person hypotheses from the audio or the image channels, person recognition in videos relies on a multimodal decision process that merges the different hypotheses produced in order to decide, for each frame, who is present in the video at the audio level, at the image level or at the content level (person mention in speech or inserted text boxes). In this framework the focus of this paper is to produce a list of person presence hypotheses from the audio channel of a video document only, to be used in addition to person presence detected at the image level by a multimodal fusion process. In this study we focus on the audio channel only, using two kinds of features: linguistic features corresponding to the way a person is mentioned by a speaker; structural features corresponding to the context of occurrence of a name in a show. We show that both sets of features are complementary and that good results can be achieved on a TV show corpus annotated with person presence labels.
  • Keywords
    feature extraction; image fusion; image recognition; natural language processing; speech recognition; video signal processing; TV shows; audio channel; image channel ambiguity; linguistic features; multimodal decision process; multimodal fusion process; person hypotheses extraction; person presence detection; person recognition; sound channel ambiguity; structural features; video document; Face; Feature extraction; Pragmatics; Speech; Speech recognition; TV; Videos; Boosting; Identification of persons; Named Entity; Spoken Language Understanding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289062
  • Filename
    6289062