• DocumentCode
    3605677
  • Title

    Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition

  • Author

    Anran Wang ; Jiwen Lu ; Jianfei Cai ; Tat-Jen Cham ; Gang Wang

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • Volume
    17
  • Issue
    11
  • fYear
    2015
  • Firstpage
    1887
  • Lastpage
    1898
  • Abstract
    Most existing feature learning-based methods for RGB-D object recognition either combine RGB and depth data in an undifferentiated manner from the outset, or learn features from color and depth separately, which do not adequately exploit different characteristics of the two modalities or utilize the shared relationship between the modalities. In this paper, we propose a general CNN-based multi-modal learning framework for RGB-D object recognition. We first construct deep CNN layers for color and depth separately, which are then connected with a carefully designed multi-modal layer. This layer is designed to not only discover the most discriminative features for each modality, but is also able to harness the complementary relationship between the two modalities. The results of the multi-modal layer are back-propagated to update parameters of the CNN layers, and the multi-modal feature learning and the back-propagation are iteratively performed until convergence. Experimental results on two widely used RGB-D object datasets show that our method for general multi-modal learning achieves comparable performance to state-of-the-art methods specifically designed for RGB-D data.
  • Keywords
    backpropagation; convergence; convolution; learning (artificial intelligence); neural nets; object recognition; CNN layers; RGB data; RGB-D object recognition; backpropagation; convergence; convolutional neural networks; depth data; feature learning-based methods; large-margin multimodal deep learning; Correlation; Feature extraction; Image color analysis; Labeling; Machine learning; Neural networks; Object recognition; Deep learning; RGB-D object recognition; large-margin feature learning; multi-modality;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2015.2476655
  • Filename
    7258382