Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study

Author

Borji, A. ; Sihite, D.N. ; Itti, L.

Author_Institution

Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA

Volume

22

Issue

1

fYear

2013

fDate

Jan. 2013

Firstpage

55

Lastpage

69

Abstract

Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as “visual saliency.” Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

Keywords

computational complexity; computer vision; object detection; object recognition; PASCAL VOC challenge; bottom-up visual saliency; computational complexity; eye movement prediction; human-model agreement; machine vision; object detection; object recognition; visual attention; visual saliency modeling; Analytical models; Computational modeling; Government; Humans; Predictive models; Videos; Visualization; Bottom-up attention; eye movement prediction; model comparison; visual attention; visual saliency; Area Under Curve; Attention; Computational Biology; Databases, Factual; Eye Movements; Humans; Image Processing, Computer-Assisted; Models, Statistical; Photic Stimulation; Video Recording;

fLanguage

English

Journal_Title

Image Processing, IEEE Transactions on

Publisher

ieee

ISSN

1057-7149

Type

jour

DOI

10.1109/TIP.2012.2210727

Filename

6253254