• DocumentCode
    3716090
  • Title

    An efficient audiovisual saliency model to predict eye positions when looking at conversations

  • Author

    Antoine Coutrot;Nathalie Guyader

  • Author_Institution
    CoMPLEX, University College London London, United Kingdom
  • fYear
    2015
  • Firstpage
    1531
  • Lastpage
    1535
  • Abstract
    Classic models of visual attention dramatically fail at predicting eye positions on visual scenes involving faces. While some recent models combine faces with low-level features, none of them consider sound as an input. Yet it is crucial in conversation or meeting scenes. In this paper, we describe and refine an audiovisual saliency model for conversation scenes. This model includes a speaker diarization algorithm which automatically modulates the saliency of conversation partners´ faces and bodies according to their speaking-or-not status. To merge our different features into a master saliency map, we use an efficient statistical method (Lasso) allowing a straightforward interpretation of feature relevance. To train and evaluate our model, we run an eye tracking experiment on a publicly available meeting videobase. We show that increasing the saliency of speakers´ faces (but not bodies) greatly improves the predictions of our model, compared to previous ones giving an equal and constant weight to each conversation partner.
  • Keywords
    "Visualization","Signal processing algorithms","Heuristic algorithms","Statistical analysis","Europe","Signal processing","Speech"
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference (EUSIPCO), 2015 23rd European
  • Electronic_ISBN
    2076-1465
  • Type

    conf

  • DOI
    10.1109/EUSIPCO.2015.7362640
  • Filename
    7362640