• DocumentCode
    8910
  • Title

    Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure

  • Author

    Daixin Wang ; Peng Cui ; Mingdong Ou ; Wenwu Zhu

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    17
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 2015
  • Firstpage
    1404
  • Lastpage
    1416
  • Abstract
    As large-scale multimodal data are ubiquitous in many real-world applications, learning multimodal representations for efficient retrieval is a fundamental problem. Most existing methods adopt shallow structures to perform multimodal representation learning. Due to a limitation of learning ability of shallow structures, they fail to capture the correlation of multiple modalities. Recently, multimodal deep learning was proposed and had proven its superiority in representing multimodal data due to its high nonlinearity. However, in order to learn compact and accurate representations, how to reduce the redundant information lying in the multimodal representations and incorporate different complexities of different modalities in the deep models is still an open problem. In order to address the aforementioned problem, in this paper we propose a hashing-based orthogonal deep model to learn accurate and compact multimodal representations. The method can better capture the intra-modality and inter-modality correlations to learn accurate representations. Meanwhile, in order to make the representations compact, the hashing-based model can generate compact hash codes and the proposed orthogonal structure can reduce the redundant information lying in the codes by imposing orthogonal regularizer on the weighting matrices. We also theoretically prove that, in this case, the learned codes are guaranteed to be approximately orthogonal. Moreover, considering the different characteristics of different modalities, effective representations can be attained with different number of layers for different modalities. Comprehensive experiments on three real-world datasets demonstrate a substantial gain of our method on retrieval tasks compared with existing algorithms.
  • Keywords
    data structures; information retrieval; learning (artificial intelligence); matrix algebra; compact hash codes learning; hashing-based orthogonal deep model; inter-modality correlation; intra-modality correlation; multimodal data; multimodal deep learning; multimodal representation learning; orthogonal deep structure; orthogonal regularizer; weighting matrix; Binary codes; Complexity theory; Computer aided engineering; Correlation; Joints; Redundancy; Semantics; Deep learning; multimodal hashing; similarity search;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2015.2455415
  • Filename
    7154455