• DocumentCode
    11357
  • Title

    Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

  • Author

    He, Kaiming ; Zhang, Xiangyu ; Ren, Shaoqing ; Sun, Jian

  • Author_Institution
    Visual Computing Group, Microsoft Research, Beijing, China
  • Volume
    37
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 1 2015
  • Firstpage
    1904
  • Lastpage
    1916
  • Abstract
    Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 \\times 224) input image. This requirement is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 \\times faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this - ompetition.
  • Keywords
    Accuracy; Agriculture; Convolutional codes; Feature extraction; Testing; Training; Vectors; Convolutional Neural Networks; Convolutional neural networks; Image Classification; Object Detection; Spatial Pyramid Pooling; image classification; object detection; spatial pyramid pooling;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2015.2389824
  • Filename
    7005506