DocumentCode :
27921
Title :
Learning Deep Hierarchical Visual Feature Coding
Author :
Hanlin Goh ; Thome, Nicolas ; Cord, Matthieu ; Joo-Hwee Lim
Author_Institution :
Inst. for Infocomm Res., Agency for Sci., Technol. & Res., Singapore, Singapore
Volume :
25
Issue :
12
fYear :
2014
fDate :
Dec. 2014
Firstpage :
2212
Lastpage :
2225
Abstract :
In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model´s inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.
Keywords :
Boltzmann machines; gradient methods; image coding; image representation; learning (artificial intelligence); 15-scenes data set; Caltech-101; RBM; SIFT; bag of words framework; categorization task; coding layer; competitive categorization accuracy; deep hierarchical visual feature coding; experimental evaluation; hierarchical coding scheme; hybrid architecture; image categorization data set; image data set; image modeling strength; learning deep architecture; local gradient-based descriptor; representational power; selective distribution; sparse coding method; sparse distribution; spatial aggregating restricted Boltzmann machines; supervised fine-tuning; visual representation; Adaptation models; Computer architecture; Dictionaries; Encoding; Learning systems; Neural networks; Visualization; Bag-of-words (BoW) framework; computer vision; deep learning; dictionary learning; hierarchical visual architecture; image categorization; restricted Boltzmann machine (RBM); sparse feature coding; transfer learning; transfer learning.;
fLanguage :
English
Journal_Title :
Neural Networks and Learning Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
2162-237X
Type :
jour
DOI :
10.1109/TNNLS.2014.2307532
Filename :
6763041
Link To Document :
بازگشت