Abstract :
Saliency in 2D imagery has been receiving increasing attention over the last few years owing to the need to minimize computation requirements through visual search space reduction, especially in the field of domestic robotics. Saliency and pre-attention mechanisms such as the Itti-Koch model have largely been focused on multi-scale local features mimicking low level attention processes in visual system, without any regard for the semantic content of the scene and therefore any cognitive grounding in visual processing. The `k-TR´ theory presents the first attempt at a true cognitive understanding of scenes by explaining visual perception and object recognition, in terms of Recognition of Component Affordances (RBCA). The k-TR model, presents a bi-layer recognition process through a combination of local, global, semantic and affordance features. The k-TR theory provides psychophysical, neurobiological, linguistic and evolutionary studies to support the theory and explains recognition of over 250 categories of common household objects. The features used by k-TR for object representation, termed as k-TRONs are available from the publicly available Affordance Network database (AfNet). In this paper, we use the k-TRON features, in particular the 35+ affordance features, in order to incorporate semantic context into saliency models. Saliency or surprise for pre-attention is modeled in the form of affordance aberrations. By using affordance aberration features for conspicuity map generation, we show that the resulting saliency and attention points more closely resemble the salient regions or surprise regions generated by the human visual system, hence providing superior performance in comparison to the Itti framework. Furthermore, by learning of affordance affinities from test subjects, the degree of influence of each affordance aberration towards visual saliency is estimated and incorporated into the overall saliency model.
Keywords :
feature extraction; image representation; object recognition; visual perception; 2D imagery; AfNet; Itti-Koch model; RBCA; affordance aberrations; affordance features; affordance network database; bilayer recognition process; cognitive scene understanding; computation requirement minimization; conspicuity map generation; domestic robotics; evolutionary studies; global features; household object recognition; human visual system; k-TR Theory; k-TRON features; linguistic studies; neurobiological studies; object representation; preattention mechanisms; psychophysical studies; recognition of component affordances; salient regions; semantic features; semantic saliency; surprise regions; visual perception; visual processing; visual saliency; visual search space reduction; Detectors; Filtration; Humans; Search problems; Semantics; Visual perception; Visualization;