مرکز منطقه ای اطلاع رساني علوم و فناوري - Enhancing sparse voice annotation for semantic retrieval of personal photos by continuous space word representations

DocumentCode :

730835

Title :

Enhancing sparse voice annotation for semantic retrieval of personal photos by continuous space word representations

Author :

Yuan-ming Liou ; Hung-tsung Lu ; Yi-sheng Fu ; Hsu, Winston ; Lin-shan Lee

Author_Institution :

Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

5341

Lastpage :

5345

Abstract :

It is very attractive for the user to retrieve photos from a huge collection using high-level personal queries (e.q. uncle Bill´s house), but technically very challenging. The previous work proposed a set of approaches to achieve the goal assuming only 30% of the photos are annotated by sparse spoken descriptions when the photos are taken. This includes fusing the sparse spontaneously spoken features with visual features of the photos by non-negative matrix factorization (NMF), and enhancing the results with two-layer mutually reinforced random walk. However, because the speech annotation is very sparse, the retrieval is very often dominated by the very complete visual features. In this paper, we propose to use continuous space word representations to extend the sparse speech information and expand the photo representation to enhance the retrieval model. Very encouraging improvements were observed in the preliminary experiments.

Keywords :

feature extraction; image enhancement; image fusion; image representation; image retrieval; matrix decomposition; speech enhancement; NMF; continuous space word representation; nonnegative matrix factorization; personal photo semantic retrieval; photo representation; sparse speech information; sparse spontaneously spoken feature fusion; sparse voice annotation enhancement; speech annotation; two-layer mutually reinforced random walk; Feature extraction; Lattices; Semantics; Sparse matrices; Speech; Speech enhancement; Visualization; fused features; image retrieval; nonnegative matrix factorization; speech annotation; word representation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178991

Filename :

7178991

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730835