مرکز منطقه ای اطلاع رساني علوم و فناوري - Cross-media retrieval via fusing multi-modality and multi-grained data

Title of article :

Cross-media retrieval via fusing multi-modality and multi-grained data

Author/Authors :

Liu ، Z. Shandong Provincial Key Laboratory of Digital Media Technology - School of Computer Science and Technology - Shandong University of Finance and Economics , Yuan ، S. Shandong Provincial Key Laboratory of Digital Media Technology - Shandong Provincial Key Laboratory of Digital Media Technology - School of Computer Science and Technology - Shandong University of Finance and Economics , Pei ، X. Shandong Provincial Key Laboratory of Digital Media Technology - School of Computer Science and Technology - Shandong University of Finance and Economics , Gao ، S. Shandong Provincial Key Laboratory of Digital Media Technology - Shandong Provincial Key Laboratory of Digital Media Technology - School of Computer Science and Technology - Shandong University of Finance and Economics , Han ، H. Shandong Provincial Key Laboratory of Digital Media Technology - Shandong Provincial Key Laboratory of Digital Media Technology - School of Computer Science and Technology - Shandong University of Finance and Economics

From page :

1645

To page :

1669

Abstract :

Traditional cross-media retrieval methods mainly focus on coarse-grained data that reflect global characteristics, while ignoring the fine-grained descriptions of local details. Meanwhile, traditional methods cannot accurately describe the correlations between the anchor and the irrelevant data. To solve the problems mentioned above, this paper proposes to fuse coarse-grained and fine-grained features and a multi-margin triplet loss on the basis of a dual-framework. 1) Framework I: a multi grained data fusion framework based on Deep Belief Network, and 2) Framework II: a multi-modality data fusion framework based on the multi-margin triplet loss function. In Framework I, the coarse grained and fine-grained features fused by the joint Restricted Boltzmann Machine are input into Framework II. In Framework II, we innovatively propose the multi-margin triplet loss. The data, which belong to different modalities and semantic categories, are stepped away from the anchor in a multi-margin way. Experimental results show that the proposed method achieves better cross-media retrieval performance than other methods with different datasets. Furthermore, the ablation experiments verify that our proposed multi-grained fusion strategy and the multi-margin triplet loss function are effective.

Keywords :

Cross , media retrieval , Multi , modality data , Multi , grained data , Multi , Margin triplet loss , Margin , set

Journal title :

Scientia Iranica(Transactions D: Computer Science and Electrical Engineering)

Journal title :

Scientia Iranica(Transactions D: Computer Science and Electrical Engineering)

Record number :

2752940

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2752940