مرکز منطقه ای اطلاع رساني علوم و فناوري - Multimodal information fusion for video concept detection

DocumentCode :

3043867

Title :

Multimodal information fusion for video concept detection

Author :

Wu, Yi ; Lin, Ching-King ; Chang, Edward Y. ; Smith, John R.

Author_Institution :

Dept. of Electr. & Comput. Eng., California Univ., USA

Volume :

fYear :

2004

fDate :

24-27 Oct. 2004

Firstpage :

2391

Abstract :

Video media carries multimodal information including visual, audio, textual data. Considerable research has been focused on utilizing multimodal features for better understanding of video content. However, many problems remain such as how to combine multimodal features and what are the effects of different combinations. In this paper, we propose to find the optimal combination of multimodal information in order to improve the performance of video concept detection using two methods, one is gradient-descent-optimization linear fusion and the other is super-kernel nonlinear fusion. Gradient-descent-optimization linear fusion learns an optimal weighted linear combination of single modalities based on fusing individual kernel matrices with gradient descent techniques. Super-kernel nonlinear fusion trains separate classifiers for single modalities as the first step. Once individual models have been designed, super-kernel nonlinear fusion learns an optimal nonlinear combination of individual models by fusing single-modality classifiers. Our experiments show that both methods improve performance significantly on TREC-Video 2003 benchmarks.

Keywords :

gradient methods; image classification; matrix algebra; optimisation; video signal processing; TREC-Video 2003 benchmark; gradient-descent-optimization linear fusion; linear combination; multimodal information; single modality classifier; super-kernel nonlinear fusion; textual data; video concept detection; video media; Data mining; Face detection; Feature extraction; Gunshot detection systems; Hidden Markov models; Kernel; Optimization methods; Speech analysis; Speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Image Processing, 2004. ICIP '04. 2004 International Conference on

ISSN :

1522-4880

Print_ISBN :

0-7803-8554-3

Type :

conf

DOI :

10.1109/ICIP.2004.1421582

Filename :

1421582

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3043867