• DocumentCode
    2995188
  • Title

    Generating Image Descriptions Using Semantic Similarities in the Output Space

  • Author

    Verma, Yashaswi ; Gupta, Arpan ; Mannem, Prashanth ; Jawahar, C.V.

  • Author_Institution
    Int. Inst. of Inf. Technol., Hyderabad, India
  • fYear
    2013
  • fDate
    23-28 June 2013
  • Firstpage
    288
  • Lastpage
    293
  • Abstract
    Automatically generating meaningful descriptions for images has recently emerged as an important area of research. In this direction, a nearest-neighbour based generative phrase prediction model (PPM) proposed by (Gupta et al. 2012) was shown to achieve state-of-the-art results on PASCAL sentence dataset, thanks to the simultaneous use of three different sources of information (i.e. visual clues, corpus statistics and available descriptions). However, they do not utilize semantic similarities among the phrases that might be helpful in relating semantically similar phrases during phrase relevance prediction. In this paper, we extend their model by considering inter-phrase semantic similarities. To compute similarity between two phrases, we consider similarities among their constituent words determined using WordNet. We also re-formulate their objective function for parameter learning by penalizing each pair of phrases unevenly, in a manner similar to that in structured predictions. Various automatic and human evaluations are performed to demonstrate the advantage of our "semantic phrase prediction model" (SPPM) over PPM.
  • Keywords
    image matching; text analysis; text detection; PASCAL sentence dataset; SPPM; WordNet; automatic evaluations; automatic meaningful image description generation; corpus statistics; human evaluations; information sources; interphrase semantic similarities; nearest-neighbour based generative PPM; nearest-neighbour based generative phrase prediction model; output space; parameter learning; phrase relevance prediction; semantic phrase prediction model; visual clues; Detectors; Equations; Hidden Markov models; Mathematical model; Predictive models; Semantics; Visualization; Image Description; Semantic Similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on
  • Conference_Location
    Portland, OR
  • Type

    conf

  • DOI
    10.1109/CVPRW.2013.50
  • Filename
    6595889