• DocumentCode
    32117
  • Title

    Heterogeneous Visual Codebook Integration Via Consensus Clustering for Visual Categorization

  • Author

    Lopez-Sastre, Roberto J. ; Renes-Olalla, Javier ; Gil-Jimenez, Pedro ; Maldonado-Bascon, Saturnino ; Lafuente-Arroyo, Sergio

  • Author_Institution
    Dept. of Signal Theor. & Commun., Univ. of Alcala, Alcalá de Henares, Spain
  • Volume
    23
  • Issue
    8
  • fYear
    2013
  • fDate
    Aug. 2013
  • Firstpage
    1358
  • Lastpage
    1368
  • Abstract
    Most recent category-level object and activity recognition systems work with visual words, i.e., vector-quantized local descriptors. These visual vocabularies are usually built by using a local feature, such as SIFT, and a single clustering algorithm, such as K-means. However, very different clusterings algorithms are at our disposal, each of them discovering different structures in the data. In this paper, we explore how to combine these heterogeneous codebooks and introduce a novel approach for their integration via consensus clustering. Considering each visual vocabulary as one modal, we propose the visual word aggregation (VWA) methodology, to learn a common codebook, where the stability of the visual vocabulary construction process is increased, the size of the codebook is determined in an unsupervised integration, and more discriminative representations are obtained. With the aim of obtaining contextual visual words, we also incorporate the spatial neighboring relation between the local descriptors into the VWA process: the contextual-VWA approach. We integrate over-segmentation algorithms and spatial grids into the aggregation process to obtain a visual vocabulary that narrows the semantic gap between visual words and visual concepts. We show how the proposed codebooks perform in recognizing objects and scenes on very challenging datasets. Compared with unimodal visual codebook construction approaches, our multimodal approach always achieves superior performances.
  • Keywords
    image representation; object recognition; pattern classification; VWA methodology; activity recognition systems; clustering algorithm; consensus clustering; contextual visual words; discriminative representations; heterogeneous codebooks; object recognition systems; oversegmentation algorithms; spatial grids; spatial neighboring relation; unsupervised integration; visual vocabulary construction process; visual word aggregation; Clustering algorithms; Histograms; Semantics; Vector quantization; Vectors; Visualization; Vocabulary; Clustering aggregation; consensus clustering; object recognition; scene recognition; visual words;
  • fLanguage
    English
  • Journal_Title
    Circuits and Systems for Video Technology, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1051-8215
  • Type

    jour

  • DOI
    10.1109/TCSVT.2013.2243058
  • Filename
    6422366