• DocumentCode
    3672375
  • Title

    Scene classification with semantic Fisher vectors

  • Author

    Mandar Dixit; Si Chen; Dashan Gao;Nikhil Rasiwasia;Nuno Vasconcelos

  • Author_Institution
    University of California, San Diego, USA
  • fYear
    2015
  • fDate
    6/1/2015 12:00:00 AM
  • Firstpage
    2974
  • Lastpage
    2983
  • Abstract
    With the help of a convolutional neural network (CNN) trained to recognize objects, a scene image is represented as a bag of semantics (BoS). This involves classifying image patches using the network and considering the class posterior probability vectors as locally extracted semantic descriptors. The image BoS is summarized using a Fisher vector (FV) embedding that exploits the properties of the space of these descriptors. The resulting representation is referred to as a semantic Fisher vector. Two implementations of a semantic FV are investigated. First involves modeling the BoS with a Dirichlet Mixture and computing the Fisher gradients for this model. Due to the difficulty of mixture modeling on a non-Euclidean probability simplex, this approach is shown to be unsuccessful. A second implementation is derived using the interpretation of semantic descriptors as parameters of a multinomial distribution. Like the parameters of any exponential family, these can be projected into their natural parameter space. For a CNN, this is shown equivalent to using inputs of its soft-max layer as patch descriptors. A semantic FV is then computed as a Gaussian Mixture FV in the space of these natural parameters. This representation is shown to outperform other alternatives such as FVs of features from the intermediate CNN layers or a classifier obtained by adapting (fine-tuning) the CNN. The proposed FV represents an embedding for object classification probabilities. As an image representation, therefore, it is complementary to the features obtained from a scene classification CNN. A combination of the two representations is shown to achieve state-of-the-art results on MIT Indoor scenes and SUN datasets.
  • Keywords
    "Semantics","Feature extraction","Image representation","Principal component analysis","Visualization","Object recognition","Encoding"
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on
  • Electronic_ISBN
    1063-6919
  • Type

    conf

  • DOI
    10.1109/CVPR.2015.7298916
  • Filename
    7298916