• DocumentCode
    254351
  • Title

    What Are You Talking About? Text-to-Image Coreference

  • Author

    Chen Kong ; Dahua Lin ; Bansal, Mayank ; Urtasun, Raquel ; Fidler, Sanja

  • Author_Institution
    Tsinghua Univ., Beijing, China
  • fYear
    2014
  • fDate
    23-28 June 2014
  • Firstpage
    3558
  • Lastpage
    3565
  • Abstract
    In this paper we exploit natural sentential descriptions of RGB-D scenes in order to improve 3D semantic parsing. Importantly, in doing so, we reason about which particular object each noun/pronoun is referring to in the image. This allows us to utilize visual information in order to disambiguate the so-called coreference resolution problem that arises in text. Towards this goal, we propose a structure prediction model that exploits potentials computed from text and RGB-D imagery to reason about the class of the 3D objects, the scene type, as well as to align the nouns/pronouns with the referred visual objects. We demonstrate the effectiveness of our approach on the challenging NYU-RGBD v2 dataset, which we enrich with natural lingual descriptions. We show that our approach significantly improves 3D detection and scene classification accuracy, and is able to reliably estimate the text-to-image alignment. Furthermore, by using textual and visual information, we are also able to successfully deal with coreference in text, improving upon the state-of-the-art Stanford coreference system [15].
  • Keywords
    image resolution; natural language processing; text analysis; 3D semantic parsing; NYU-RGBD v2 dataset; RGB-D scenes; Stanford coreference system; natural lingual descriptions; natural sentential descriptions; noun-pronoun; structure prediction model; text-to-image alignment; text-to-image coreference; visual information; Accuracy; Image color analysis; Image segmentation; Semantics; Solid modeling; Three-dimensional displays; Visualization; 3D object detection; Text and images; scene understanding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
  • Conference_Location
    Columbus, OH
  • Type

    conf

  • DOI
    10.1109/CVPR.2014.455
  • Filename
    6909850