• DocumentCode
    3672315
  • Title

    Mind´s eye: A recurrent visual representation for image caption generation

  • Author

    Xinlei Chen;C. Lawrence Zitnick

  • Author_Institution
    Carnegie Mellon University, USA
  • fYear
    2015
  • fDate
    6/1/2015 12:00:00 AM
  • Firstpage
    2422
  • Lastpage
    2431
  • Abstract
    In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. The representation automatically learns to remember long-term visual concepts. Our model is capable of both generating novel captions given an image, and reconstructing visual features given an image description. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are equal to or preferred by humans 21.0% of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
  • Keywords
    "Visualization","Computational modeling","Recurrent neural networks","Training","Bidirectional control","Entropy"
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on
  • Electronic_ISBN
    1063-6919
  • Type

    conf

  • DOI
    10.1109/CVPR.2015.7298856
  • Filename
    7298856