DocumentCode :
3748738
Title :
Common Subspace for Model and Similarity: Phrase Learning for Caption Generation from Images
Author :
Yoshitaka Ushiku;Masataka Yamaguchi;Yusuke Mukuta;Tatsuya Harada
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
fYear :
2015
Firstpage :
2668
Lastpage :
2676
Abstract :
Generating captions to describe images is a fundamental problem that combines computer vision and natural language processing. Recent works focus on descriptive phrases, such as "a white dog" to explain the visual composites of an input image. The phrases can not only express objects, attributes, events, and their relations but can also reduce visual complexity. A caption for an input image can be generated by connecting estimated phrases using a grammar model. However, because phrases are combinations of various words, the number of phrases is much larger than the number of single words. Consequently, the accuracy of phrase estimation suffers from too few training samples per phrase. In this paper, we propose a novel phrase-learning method: Common Subspace for Model and Similarity (CoSMoS). In order to overcome the shortage of training samples, CoSMoS obtains a subspace in which (a) all feature vectors associated with the same phrase are mapped as mutually close, (b) classifiers for each phrase are learned, and (c) training samples are shared among co-occurring phrases. Experimental results demonstrate that our system is more accurate than those in earlier work and that the accuracy increases when the dataset from the web increases.
Keywords :
"Training","Visualization","Learning systems","Neural networks","Grammar","Scalability","Feature extraction"
Publisher :
ieee
Conference_Titel :
Computer Vision (ICCV), 2015 IEEE International Conference on
Electronic_ISBN :
2380-7504
Type :
conf
DOI :
10.1109/ICCV.2015.306
Filename :
7410663
Link To Document :
بازگشت